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INTRODUCTION: 

Since  many  types  of  breast  cancer  remain  untreatable,  the  research  proposal  aims  to 
develop  novel  genomic  technology  to  identify  potential  therapeutic  targets  and  to  aid  in 
diagnosing  various  types  of  breast  cancer  at  the  molecular  level.  The  overarching  goal  of  the 
proposal  is  to  develop  a  technology  to  screen  nucleic-acid  protein  interactions  on  a  genome  scale 
with  a  focus  on  understanding  complexes  involved  in  breast  cancer.  In  order  to  identify  the 
regulatory  networks  of  interactions  between  RNAs  and  proteins,  we  proposed  to  develop  a  rapid 
genome-scale  method  to  determine  the  specific  RNA  targets  and  RNA  binding  sites  of  proteins. 
The  aims  were  to  1)  discover  RNA  targets  of  specific  RNA  binding  proteins  and  2)  define  the 
RNA  sequences  recognized  by  proteins  using  novel  nanotechnologies  including  development  of 
optically  encoded  beads  containing  both  a  unique  optical  signature  and  a  specific 
oligonucleotide.  This  technology  is  being  complemented  by  genome-wide  chromatin 
immunoprecipitation  and  RNA  profiling  on  exon  arrays.  Progress  during  the  past  year  has  been 
made  on  Aims  2,  3  and  4  of  the  original  grant  as  detailed  below.  Two  papers  mapping  proteins 
to  the  genome  have  been  submitted  for  publication  while  a  third  paper  is  in  preparation. 


BODY: 

Aim  1  was  successfully  completed  as  marked  by  the  publication  of  a  paper  describing  the 
assay  (Brodsky  and  Silver,  2002).  Thus,  we  have  laid  the  groundwork  for  genomic  and  small 
molecule  screening  using  the  microbead  assay.  The  assay  has  generated  some  interest  in  the 
community  as  we  have  recently  written  an  invited  review  discussing  the  microbead  assay  we 
have  developed  [1]  (see  attached).  Although  the  goals  remain  the  same,  we  are  now  using  a  new 
and  potentially  more  powerful  method  than  originally  proposed  in  Aim  2.  The  research 
accomplishments  associated  with  each  task  outlined  in  the  approved  Statement  of  Work  are 
detailed  below. 

Technical  Objective  1:  Determine  optimal  conditions  for  bead-based  genomic  screening. 

The  goals  of  this  objective  were  completed  as  outlined  in  a  published  paper  [1]  (see 
attached).  We  were  also  invited  to  write  a  review  highlighting  this  new  technology  to  probe 
RNA-protein  interactions  [2]  (see  attached). 

Technical  Objective  2:  Identification  of  target  RNAs  of  clinically  important  proteins. 

While  the  new  microbead  technology  is  being  developed,  current  microarray 
technologies  could  be  used  to  determine  candidate  binding  targets.  Recently,  Chromatin 
Immunoprecipitation  (CHiP)  has  emerged  as  a  powerful  method  to  identify  where  on  a  gene  and 
-  in  combination  with  microarrays  (ChiP-chip)  -  on  which  genes  chromatin  associated  proteins 
are  binding  [3,  4].  Briefly,  cells  are  cross-linked  and  chromatin  is  sheared  to  approximately 
1,000  bp  average  size.  The  protein  of  interest  is  immunoprecpitated  and  the  DNA  is  isolated  for 
quantitative  PCR  analysis  or  microarrays.  Because  our  lab,  as  well  as  others,  has  shown  that 
many  RNA  binding  proteins  bind  co-transcriptionally,  we  can  take  advantage  of  this  approach. 

In  addition,  representing  a  more  in  vivo  situation,  a  potentially  significant  advantage  of  CHiP  is 
that  the  cells  are  formaldehyde  cross-linked  allowing  the  capture  of  dynamic  interactions.  We 
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have  modified  and  improved  the  ChIP  approach  to  localize  RNA  Binding  Proteins  (RBPs)  on 
various  genes.  Some  of  our  modifications  include  the  use  of  a  second  protein-protein  crosslinker 
in  addition  to  the  commonly  used  formaldehyde.  We  also  use  log-linear  fitting  of  real-time  PCR 
data  which  enhances  our  sensitivity  and  dynamic  range  of  the  analysis  [5].  Figure  1  shows 
enrichments  of  two  RNA  binding  proteins  and  two  states  of  RNA  Polymerase  II  across  the  PTB 
gene.  We  observe  RNA  binding  proteins  at  the  5’  end  of  genes  as  well  as  at  sites  of  alternative 
splicing.  Similarly,  we  observe  hypophosphorylated  RNA  polymerase  II  at  the  5’  end  of  the 
PTB  gene  but  not  at  other  locations  of  the  PTB  gene. 
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Figure  1 .  ChIP  assay  reveals  patterns  of  RNA  binding  protein  and  RNA 
Polymerase  II  enrichment  across  genes.  A  series  of  primers  A.  The  two 
RNA  binding  proteins  PTB  and  U2AF65  are  found  at  the  5’  end  and  at 
alternatively  spliced  exons.  Hypophosphorylated  RNA  Polymerase  II  (PolIIa) 
is  found  only  around  the  transcription  initiation  sites.  Phosphorylated  RNA 
Polymerase  II  is  found  at  the  transcription  initiation  sites,  the  alternatively 
spliced  exons,  as  well  as  the  3’  end  of  the  gene.  These  latter  locations  are 
regions  where  transcription  is  coupled  to  pre-mRNA  processing. 
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Figure  2.  Summary  of  factor  enrichments  across  the 
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Importantly,  we  have  shown  that  the  sites  of  enrichment  are  regulated  by  PTB  and 
U2AF65.  Figure  2  shows  the  enrichment  of  PTB  and  U2AF65  across  the  MDM2  oncogene. 
MDM2  is  an  important  oncogene  mutated  in  many  tumors  and  tumor  specific  isoforms  have 
been  identified  [6].  Knockdown  experiments  using  siRNAs  against  PTB  and  U2AF65  suggest 
that  the  exon  levels  of  some  of  these  enriched  regions  are  regulated  by  PTB  and  U2AF65  as 
determined  by  RNAse  Protection  Assays  (RPA).  Interestingly,  the  region  around  the  m5  and  m6 
primers  is  in  the  vicinity  of  the  nuclear  localization  sequence  suggesting  that  PTB  and  U2AF65  - 
may  be  regulating  the  inclusion  or  exclusion  of  these  protein  sequences.  Thus,  these  RNA 
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binding  proteins  may  be  controlling  the  cytoplasmic  localization  of  MDM2.  These  studies  are 
providing  insight  into  some  of  the  factors  involved  in  the  complicated  post-transcriptional 
regulation  of  MDM2. 

We  have  also  found  that  PTB  and  U2AF65  are  recruited  to  a  wide  variety  of  genes  at  the 
5’  end.  We  observe  PTB  and  U2AF65  at  the  5’  end  of  intronless  genes  such  as  histones, 
constitutively  spliced  genes  such  as  actin,  and  alternatively  spliced  genes.  In  Technical 
Objectives  3  and  4,  we  describe  microarray  data  across  hundreds  of  genes  where  we  observe  the 
generality  of  these  observations. 

In  sum,  these  data  demonstrate  that  ChIP  of  RNA  binding  proteins  in  mammalian  cells  is 
feasible.  Furthermore,  we  find  that  the  sites  of  recruitment  are  locations  of  regulation  of  gene 
expression. 

Technical  Objective  3:  Screen  RNA-protein  interactions  of  a  targeted  set  of  genes. 

To  determine  the  binding  profile  of  a  larger  number  of  genes  we  designed  and 
constructed  our  own  microarrays  to  probe  a  larger  number  of  genes.  Open  Reading  Frames 
(ORFs)  were  cloned  from  cDNA  libraries,  sequence  verified  and  spotted  onto  slides.  This  array 
includes  a  number  of  genes  relevant  to  breast  cancer  including  MDM2  and  numerous  kinases. 
ChIP-chip  experiments  with  these  ORF  microarrays  identified  a  number  of  new  putative  targets 
for  PTB  including  SNK,  DAPK3  and  MDM2.  These  genes  are  rich  in  alternative  splicing.  We 
have  verified  DAPK3  and  MDM2  by  PCR  analysis.  Interestingly,  DAPK3  shows  approximately 
10-fold  stronger  enrichment  at  its  3’  end  than  we  have  observed  for  any  other  region  for  PTB. 

With  the  rapid  advances  in  microarray  technology  over  the  past  couple  of  years,  we  are 
now  reaching  genome-scales.  We  are  part  of  an  early  access  program  to  use  Affymetrix  tiled 
arrays.  These  arrays  include  a  25mer  oligonucleotide  probe  every  20  base  pairs  across  the 
ENCODE  regions  [7]. 

In  order  to  learn  how  to  perform  ChIP-chip  experiments  and  develop  analysis  tools,  we 
first  analyzed  the  localization  of  two  states  of  RNA  Polymerase  II.  These  data  have  been 
submitted  for  publication  and  the  manuscript  is  included  in  the  appendix. 

We  made  a  number  of  technical  advances  during  this  work  which  helped  us  improve  the 
quality  of  the  data.  These  include  a  new  random  primer  amplification  method.  We  have  also 
developed  a  number  of  analysis  tools  which  we  expect  to  make  generally  available  including  the 
analysis  of  constitutive  and  alternatively  spliced  exons.  This  analysis  allowed  us  to  discover  that 
hyperphosphorylated  RNA  Polymerase  II  accumulated  more  often  at  alternatively  spliced  exons 
(see  attached  manuscript). 

We  are  analyzing  ENCODE  ChIP-chip  data  for  a  number  of  RNA  binding  proteins  with  a 
variety  of  putative  functions.  Herein,  we  will  highlight  data  for  two:  PTB  and  Aly.  Aly  is  also 
known  as  REF1,  RNA  Export  Factor  1,  and  is  a  putative  RNA  export  factor  and  part  of  the  exon 
junction  complex. 
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Figure  3.  RNA  Polymerase  II,  PTB  and 
ALY  localization  across  the  KIAA1932  gene. 
The  black  bars  represent  regions  of 
significant  enrichment.  The  different  known 
Genes  transcripts  are  in  blue.  PCR  data  are 
summarized  in  red  with  plus  signs  indicating 
significant  enrichment.  The  PCR  and  array 
data  correlate.  Note  that  PTB  localizes  to 
alternative  exons  while  Aly  appears  to  be 
biased  towards  the  3’  end  of  the  gene. 


Figure  4.  Venn  diagram  showing  the  overlap  between  ALY 
ALY,  PTB  and  Pol  II.  73%  of  PTB  sites  overlap  with 
Pol  II  while  only  30%  of  ALY  sites  overlap  with 
PolII.  Thus,  many  ALY  sites  are  not  associated  with 
RNA  Polymerase  II  transcription. 


PTB 


An  emerging  theme  is  a  growing  family  of  dual  activity  proteins  that  bind  RNA  and  also 
regulate  transcription.  Furthermore,  some  RNA  binding  proteins,  such  as  hnRNPK,  have  been 
associated  with  transcription  control  [8].  We  have  mapped  PTB  and  Aly  to  certain  promoter 
regions  but  not  others.  Thus,  for  the  first  time,  we  can  design  experiments  to  probe  the  role  of 
these  proteins  in  transcription  control  using  a  luciferase  reporter  assay. 

In  sum,  these  data  are  not  only  identifying  known  genes  that  these  RNA  binding  proteins 
may  be  regulating  but  also  providing  new  insights  into  how  RNA  binding  proteins  may  be 
interacting  with  the  genome.  For  example,  many  Aly  sites  are  not  in  annotated  gene  regions. 

One  hypothesis  is  that  these  sites  may  be  involved  in  transcription  with  RNA  Polymerase  I  or  III. 
We  are  currently  exploring  what  the  function  of  these  sites  may  be. 

Our  interest  in  RNA  binding  proteins  and  their  potential  role  in  breast  cancer  lead  us  to 
collaborate  with  Myles  Brown’s  group  to  explore  how  Estrogen  Receptor  alpha  (ER)  is 
interacting  with  the  genome.  Our  overlapping  interests  in  developing  ChIP-chip  technology  lead 
us  to  work  together  to  map  ER  across  chromosomes  chr21/22.  A  paper  has  been  submitted  for 
publication  and  is  included  in  the  appendix. 

Technical  Objective  4.  Analyze  RNA-protein  interactions  on  a  genomic  scale. 

Whole  genome  Affymetrix  tiled  arrays  have  become  available  to  us  in  late  2004.  As  a 
first  step  towards  developing  analysis  tools  to  handle  this  very  large  scale  of  data,  we  localized 
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estrogen  receptor  and  RNA  Polymerase  II  in  breast  cancer  epithelial  cell  line  (MCF-7).  Figure  5 
shows  that  we  identify  thousands  of  ER  and  PolII  sites  across  the  human  genome.  Similar, 
summarizes  the  number  of  sites  and^fiat  kind  of  sites  we  observe. 
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Figure  5.  Pie  charts  show  the 
distribution  of  the  ER  and  Pol 
II  on  different  types  of  exons 
across  the  all  the  non- 
repetitive  DNA  in  the  human 
genome.  Few  ER  sites  are  in 
annotated  promoter  regions 
while  many  of  the  Pol  II  sites 
are. 


Pol  II  on  tiled  arrays  in  estrogen  treated  breast  cancer  cells  gives  a  list  of  the  genes  being 
actively  transcribed  as  well  as  novel  intergenic  regions  that  may  be  expressed.  Interestingly,  we 
identify  5  microRNAs  with  significant  ER  enrichment  nearby.  Only  two  of  these  have  both  Pol 
II  and  ER  significant  enrichment:  mir-223  and  mir-152.  Current  experiments  are  exploring 
whether  the  levels  of  these  and  other  microRNAs  are  regulated  in  an  estrogen  dependent  manner. 

To  complement  these  genomic  location  studies,  we  are  profiling  potential  splicing 
changes  in  response  to  estrogen  using  Affymetrix  exon  arrays  as  part  of  a  collaboration  with 
Affymetrix.  These  arrays  include  probes  targeting  every  known  exon  in  the  human  genome. 

Our  understanding  of  cell’s  response  to  estrogen  remains  mysterious  despite  significant  effort 
exploring  transcriptional  control  using  standard  gene  expression  microarrays.  Early  efforts 
exploring  alternative  splicing  response  suggest  that  the  genes  changing  at  their  exon  usage  are 
not  the  same  genes  whose  mRNA  levels  are  changing.  Thus,  an  important  part  of  defining  a 
cell’s  expression  program  is  alternative  splicing.  Thousands  of  exons  are  found  to  be 
significantly  changing  including  a  number  of  exons  in  known  estrogen  responsive  genes  such  as 
myc,  the  BCL-1  oncogene,  and  the  Stromal  cell-derived  factor  1  precursor  (SDF-1). 

Interestingly,  these  genes  are  generally  involved  in  regulating  proliferation.  Thus,  we  are 
building  a  network  of  genes  and  exons  that  may  be  part  of  a  breast  cancer  cell’s  response  to 
estrogen.  Finally,  this  network  will  be  compared  to  the  ChIP-chip  data  of  ER  and  RNA  binding 
proteins  to  understand  the  regulation  of  gene  expression  at  the  post-transcriptional  level  in 
response  to  estrogen. 


KEY  RESEARCH  ACCOPLISHMENTS 

•  Developed  the  chromatin  IP  approach  to  localize  RNA  binding  proteins  to  the  human 
genome. 

•  Identified  new  gene  targets  of  RNA  binding  proteins  in  the  ENCODE  regions. 

•  Representative  splicing  factors  PTB  and  U2AF65  are  recruited  to  many  genes  at  the  5’ 
end  but  have  exon  specificity  within  the  gene. 

•  Localized  two  states  of  RNA  Polymerase  II  to  the  ENCODE  regions  in  HeLa  cells. 
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•  A  major  regulator  of  transcription  elongation  is  coupling  to  pre-mRNA  processing. 

•  Identified  novel  sites  of  transcription  in  the  ENCODE  regions 

•  Localized  RNA  Polymerase  II  to  the  whole-genome  in  breast  cancer  cells. 

•  Identifying  novel  sites  of  transcription  in  response  to  estrogen. 

•  Discovered  novel  sites  of  Estrogen  Receptor  regulation  across  the  human  genome 
including  potential  microRNAs. 

•  Discovering  novel  genes  that  respond  to  estrogen  at  the  level  of  alternative  splicing. 
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All  microarray  data  files  are  being  deposited  in  the  Gene  Expression  Omnibus  (GEO)  database. 

CONCLUSIONS 

We  have  developed  an  approach  to  examine  the  role  of  RNA  binding  proteins  in  post- 
transcriptional  regulation  in  human  cells.  We  have  applied  this  technology  to  understand  the 
post-transcriptional  regulation  of  important  oncogenes  such  as  MDM2  as  well  as  the  cell’s 
response  to  estrogen.  A  number  of  groups  have  found  that  the  mRNA  levels  of  only  -100  genes 
change  upon  stimulation  of  estrogen  in  breast  cancer  cells  [9],  Furthermore,  these  are  no  clear 
patterns  amongst  these  100  genes  to  understand  how  the  cell  is  responding  to  estrogen.  Thus,  we 
have  developed  a  platform  to  examine  the  role  of  RNA  binding  proteins  and  post-transcriptional 
regulation  in  breast  cancer  cells.  The  role  of  RNA  binding  proteins  in  cancer  remains  unclear 
but  learning  which  RNA  binding  proteins  may  be  regulating  oncogenes  and  estrogen  responsive 
genes  may  provide  clues  to  the  role  of  post-transcriptional  mechanisms.  We  have  learned  that 
RNA  Polymerase  accumulates  at  exons  across  genes  and  are  identifying  novel  sites  of 
transcription  in  breast  cancer.  These  include  regions  around  microRNAs  and  other  noncoding 
RNAs. 

Thus,  during  this  grant  funding  period,  we  have  developed  genomic  approaches  to 
localize  RNA  binding  proteins  to  the  genome  and  measure  post-transcriptional  responses.  We 
have  begun  learning  about  the  role  of  RNA  binding  proteins  and  post-transcriptional  regulation 
in  breast  cancer  cells.  These  data  are  providing  new  insights  into  the  regulation  of  gene 
expression  in  breast  cancer  cells. 
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Chromosome-wide  Mapping  of  Estrogen  Receptor  Binding  Reveals  Long-range 


Combinatorial  Regulation  Requiring  Forkhead  Proteins 

Mapping  of  estrogen  receptor  binding  to  chromosomes  21  and  22  using  chromatin 
immunoprecipitation  and  tiled  microarrays  reveals  the  importance  of  Forkhead  factors  in 
estrogen-regulated  gene  expression. 
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Estrogen  plays  an  essential  physiologic  role  in  reproduction  and  a  pathologic  one  in 
the  development  and  progression  of  breast  and  endometrial  cancers.  The 
completion  of  the  human  genome  has  allowed  the  identification  of  the  expressed 
regions  of  almost  all  protein-coding  genes,  however,  little  is  known  concerning  the 
organization  of  their  cis-regulatory  elements.  We  have  mapped  the  association  of  the 
estrogen  receptor  (ER)  with  the  complete  non-repetitive  sequence  of  human 
chromosomes  21  and  22  by  combining  chromatin  immunoprecipitation  (ChIP)  with 
tiled  microarrays.  ER  binds  selectively  to  a  limited  number  of  sites,  the  majority  of 
which  are  distant  (often  greater  than  100  kb)  from  the  transcription  start  site  of 
regulated  genes.  Surprisingly,  the  unbiased  sequence  interrogation  of  the  pool  of 
genuine  chromatin  binding  sites  suggests  that  direct  ER  binding  through  canonical 
EREs  and  ERE  half-sites  requires  in  addition  the  presence  of  Forkhead  factor 
binding  in  close  proximity  to  ER  binding.  Furthermore,  knockdown  of  Forkhead 
factor  expression  blocks  the  association  of  ER  with  chromatin  and  estrogen-induced 
gene  expression  demonstrating  the  necessity  for  combinatorial  interaction  between 
these  two  signaling  pathways  in  mediating  an  estrogen  response. 

Introduction 

Estrogen  is  an  essential  regulator  of  female  development  and  reproductive  function  and 
has  been  implicated  as  a  causal  factor  in  breast  and  endometrial  cancers.  Estrogen- 
regulated  gene  expression  is  mediated  by  the  action  of  two  members  of  the  nuclear 
receptor  family,  ERa  and  ER|3,  with  ERa  being  dominant  in  both  breast  epithelial  cells 


and  in  breast  cancer.  Significant  progress  has  been  made  over  the  past  decade  in  defining 
the  complex  interactions  between  chromatin  and  an  array  of  factors  involved  in  ER- 
mediated  gene  expression  (Halachmi  et  al.,  1994;  Metivier  et  al.,  2003;  Shang  and 
Brown,  2002;  Shang  et  al.,  2000).  These  include  the  cyclic  association  of  ER,  pl60 
coactivators  (such  as  AIB-1),  histone  acetyl  transferases  (HAT)  and  chromatin  modifying 
molecules,  such  as  p300/CBP  and  p/CAF,  with  target  promoters  in  an  ordered  temporal 
fashion  (Metivier  et  al.,  2003;  Shang  et  al.,  2000). 

In  addition,  a  number  of  strategies  including  most  recently  gene  expression  profiling  on 
microarrays  have  identified  potential  ER  target  genes  in  human  breast  cancer  cells.  Of 
these  genes  the  cis-element  targeted  directly  by  ER  has  been  identified  for  only  a  small 
subset.  Estrogen  Responsive  Elements  (ERE)  have  been  identified  within  the  1  kb  5’- 
proximal  region  of  the  estrogen-regulated  genes  TFF-1  (pS2),  EBAG9  and  Cathepsin  D 
(Augereau  et  al.,  1994;  Berry  et  al.,  1989;  Ikeda  et  al.,  2000).  The  proximal  promoters  of 
target  genes  that  lack  EREs,  including  c-Myc  and  IGF-I,  contain  AP-1  and  Sp-1  sites  that 
appear  essential  for  transcription  in  in  vitro  reporter  assays  (Dubik  and  Shiu,  1992; 
Umayahara  et  al.,  1994).  Few,  if  any  regulatory  elements  at  significant  distances  from  the 
mRNA  start  sites  of  target  genes  have  been  shown  to  be  directly  targeted  by  ER  and 
computation  approaches  to  identify  novel  ER-binding  domains  have  focused  primarily  on 
gene  proximal  regions  (Bajic  and  Seah,  2003;  Bourdeau  et  al.,  2004). 

In  contrast,  a  wealth  of  studies  on  (3-globin  gene  regulation  has  contributed  to  our 
understanding  of  general  mechanisms  of  transcriptional  regulation  and  has  shown  that 


Locus  Control  Regions  (LCR)  up  to  25  kb  from  the  gene,  are  capable  of  enhancing  gene 
transcription  (recently  reviewed  in  (Bulger  et  al.,  2002)).  In  this  study  we  have 
undertaken  an  unbiased  approach  to  identify  all  regulatory  regions  that  may  play  a  role  in 
ER-mediated  transcription,  by  combining  chromatin  immunoprecipitation  (ChIP) 
analyses  of  in  vivo  ER-chromatin  complexes  with  Affymetrix  tiled  oligonucleotide 
microarrays  that  cover  the  entire  non-repetitive  sequences  of  chromosomes  21  and  22, 
including,  importantly,  all  the  intergenic  regions.  Most  previous  ChIP- microarrays 
studies  have  focused  primarily  on  promoter  regions  (Kapranov  et  ah,  2002;  Odom  et  ah, 
2004)  or  CpG  islands,  which  represent  promoter-rich  sequences  (Weinmann  et  ah,  2002). 
The  tiled  arrays  used  here  are  composed  of  25  bp  probes  located  at  35  nucleotide 
resolution  (Cawley  et  ah,  2004;  Kapranov  et  ah,  2002)  and  permit  the  opportunity  to 
interrogate  previously  unexplored  regions  of  chromosomal  DNA.  The  780  characterized 
or  predicted  genes  on  chromosomes  21  and  22  represent  about  2%  of  the  total  number  of 
genes  (Kapranov  et  ah,  2002)  and  thus  provides  a  representative  model  for  the  unbiased 
identification  of  paradigms  of  gene  regulation  by  ER. 

Here  we  find  a  discrete  number  of  ER  binding  sites  across  chromosomes  21  and  22, 
almost  all  of  which  are  in  non-promoter  proximal  regions.  We  explored  underlying 
biological  patterns  within  the  list  of  genuine  chromatin  interacting  domains  and  identified 
common  motifs  highly  enriched  in  these  regions.  Using  this  information  we  prove  that 
the  distal  ER  binding  sites  are  discrete  chromatin  regions  involved  in  transcriptional 
regulation  and  that  Forkhead  proteins,  at  these  sites,  are  required  for  activity  by  the  ER. 


Results 


ER  occupies  a  limited  number  of  binding  sites  on  chromosome  21  and  22 

Estrogen-dependent  MCF-7  breast  cancer  cells  were  deprived  of  hormones  and 
stimulated  with  estrogen  or  vehicle  for  45  minutes,  a  time  we  have  previously  shown  to 
have  maximal  recruitment  of  ER  to  the  promoters  of  several  known  gene  targets, 
including  Cathepsin  D  and  TFF-1  (Shang  et  al.,  2000).  Following  ChIP,  ER-associated 
DNA  was  amplified  using  non-biased  conditions,  labelled  and  hybridized  to  the  tiled 
microarrays.  Relative  confidence  prediction  scores  were  generated  by  quantile 
normalization  across  each  probe  followed  by  an  analysis  using  a  two-state  Hidden 
Markov  model  (Rabiner,  1989).  These  scores  included  probe  intensity  and  width  of  probe 
cluster.  Triplicate  experiments  eliminated  stochastic  false  positives  after  which  peaks  that 
appeared  at  least  twice  in  the  three  replicates  were  included.  Real  time  PCR  primers  were 
designed  against  numerous  peaks  in  the  list  and  directed  ER  ChIP  was  conducted  to 
identify  the  boundary  between  the  true  ER  binding  peaks  (>  1.5  fold  enrichment  over 
input)  and  the  false  positives  (data  not  shown).  Following  filtering,  the  final  list  contained 
a  total  of  57  estrogen-stimulated  ER  binding  sites  within  32  discrete  clusters  (Fig  1A,  IB 
and  supplemental  data  1). 

As  one  example  of  the  validity  of  this  method,  the  localization  of  ER  to  the  proximal 
promoter  400  bp  region  of  the  estrogen-regulated  gene,  TFF-1,  was  observed.  A 
functional  ERE  had  been  previously  mapped  to  the  region  393  to  405  bp  upstream  from 
the  transcription  start  site  of  TFF-1  (Berry  et  al.,  1989).  Furthermore  a  region  10.5  kb 
upstream  of  the  TFF-1  transcription  initiation  site  (Fig  1A)  was  also  found  to  be  bound 


by  ER.  Interestingly,  an  estrogen-inducible  DNase  I  hypersensitive  site  has  been 
previously  mapped  10.5  kb  upstream  from  the  TFF-1  start  site  (Giamarchi  et  al.,  1999), 
though  the  region  had  not  been  further  characterized.  Our  data  now  define  this  region  as 
an  authentic  ER-binding  site. 

Within  the  small  list  of  32  ER-binding  clusters,  we  observed  interaction  with  a  number  of 
genes  previously  implicated  as  estrogen  targets,  including  the  transcription  factor  XBP-1, 
DSCAM-1  and  the  nuclear  receptor  co-regulator  NRIP-1  (Cavailles  et  al.,  1995;  Pedram 
et  al.,  2002;  Wang  et  al.,  2004).  Binding  sites  were  also  observed  within  200  kb  from 
genes  not  previously  implicated  as  estrogen  targets,  including  SOD-1,  a  superoxide 
dismutase  gene  involved  in  scavenging  oxygen  free  radicals  (Beckman  et  al.,  1993;  Singh 
et  al.,  1998)  and  implicated  in  tamoxifen-resistant  progression  in  MCF-7  xenografts 
(Schiff  et  al.,  2000).  None  of  these  genes  recruited  ER  to  a  proximal  5’  promoter  region, 
but  possessed  divergent  patterns  of  association.  The  XBP-1  gene,  recruited  ER  to  three 
distinct  and  discrete  regions  13.2  kb  to  22.9  kb  upstream  of  the  transcription  start  site 
(Fig  IB).  DSCAM-1  contained  a  clustering  of  ten  intronic  ER  binding  sites,  more  than 
0.5  Mb  from  the  transcription  initiation  site.  NRIP-1  contained  four  ER-binding  sites  in  a 
region  of  chromosome  21  well  known  for  its  scarcity  of  genes  (Katsanis  et  al.,  1998).  5’ 
RACE  was  performed  on  NRIP-1  to  determine  the  exact  location  of  the  transcription  start 
site  and  to  identify  the  distance  between  the  ER  binding  sites  and  the  genuine 
transcriptional  start  site.  Sequencing  of  the  5’  terminus  of  the  NRIP-1  transcript  after 
estrogen  stimulation  revealed  the  presence  of  two  previously  missed  exons  for  NRIP-1, 
74.96  kb  and  97.39  kb  from  the  previously  annotated  gene  start  site  (data  not  shown). 


Therefore,  the  ER-binding  domains  exist  107  to  144  kb  from  the  genuine  transcription 
start  site  of  NRIP-1.  The  locations  of  all  binding  sites  in  relation  to  genes  can  be  found  in 
supplementary  data  4. 

The  ER  binding  sites  adjacent  to  TFF-1,  XBP-1,  SOD-1,  NRIP-1,  and  DSCAM-1  were 
validated  by  ER  ChIP  and  standard  PCR  (Fig  2A-E).  Also,  quantitative  PCR  was 
performed  on  each  of  these  sites  after  ER  ChIP  (Fig  2F)  confirming  these  putative  in  vivo 
binding  sites  as  genuine  ER  binding  sites.  To  test  whether  these  discrete  ER  recruitment 
regions  are  unique  to  estrogen  action  in  MCF-7  cells,  we  performed  ER  ChIP  and 
directed  real  time  PCR  against  the  same  sites  in  T47-D  breast  cancer  cells.  These  data 
confirm  that  the  majority  of  the  sites  identified  in  MCF-7  cells  are  also  regions  of 
estrogen  dependent  ER-binding  in  a  second  ER -positive  breast  cancer  cell  line  (data  not 
shown)  highlighting  the  conservation  of  specific  ER-chromatin  association  sites. 

A  significant  number  of  ER  binding  sites  reside  adjacent  to  estrogen  gene  targets 

Estrogen-mediated  transcript  changes  were  identified  by  converting  RNA  from  vehicle  or 
estrogen-stimulated  MCF-7  cells  into  double  stranded  cDNA  and  hybridizing  to  the 
chromosome  21  and  22  tiled  microarrays.  35  genes  (4.4%  of  all  genes)  appeared  to  be 
transcribed,  after  which  real  time  primers  were  made  against  all  these  transcripts  and 
quantitative  RT-PCR  showed  that  12  transcripts  on  chromosomes  21  and  22  were 
estrogen  induced  (Table  1).  Eleven  of  these  12  genes  had  ER  binding  clusters  within  200 
kb.  The  only  estrogen-regulated  gene  that  did  not  have  an  adjacent  ER  binding  cluster 
was  ATP5J.  TFF-1,  XBP-1  and  NRIP-1  were  in  the  small  list  of  1.5%  of  genes  up- 


regulated  following  estrogen  stimulation  (supplemental  data  1).  DSCAM-1  and  SOD-1 
were  not  upregulated  by  estrogen  stimulation  at  the  3  hr  time  point  assessed,  but  were 
transcribed  after  6  hr  of  estrogen  stimulation,  as  determined  by  RT-PCR  (supplemental 
data  3). 

The  delay  between  ER  association  and  transcription  of  DSCAM-1  and  SOD-1  may  be  a 
consequence  of  a  requirement  for  subsequent  modification  of  the  receptor  complex  or  the 
requirement  for  the  production  of  other  factors  involved  in  ER  action  but  not  necessarily 
part  of  an  ER  complex.  Regardless  of  the  mechanism  for  the  transcriptional  delay,  it  now 
appears  that  early  and  at  least  some  delayed  estrogen-regulated  genes  recruit  the  receptor 
with  the  same  kinetics.  This  implies  that  events  subsequent  to  ER  binding  are  responsible 
for  timing  the  initiation  of  transcription  of  these  delayed  targets. 

Distal  ER  binding  domains  function  as  transcriptional  enhancers 

The  significant  sequence  distance  between  many  of  the  ER  binding  sites  and  the  putative 
target  gene  complicates  their  functional  validation.  However,  we  explored  the  possibility 
that  these  ER  binding  sites  may  recruit  components  indicative  of  transcriptional 
activation.  RNA  PolII  ChIP  followed  by  real-time  PCR  was  performed  on  a  subset  of  the 
putative  regulatory  regions  adjacent  to  TFF-1,  XBP-1,  DSCAM-1,  NRIP-1  and  SOD-1 
genes.  Interestingly,  RNA  PolII  association  was  seen  with  all  of  these  sites  in  an 
estrogen-dependent  manner  (Fig  2F).  Furthermore  ChIP  of  AIB-1,  an  oncogenic  ER 
coactivator  (Kuang  et  al.,  2004;  Torres-Arzayus  et  al.,  2004),  confirmed  that  AIB-1  is 
also  present  on  all  of  these  ‘regulatory’  sites  following  estrogen  exposure  (Fig  2F).  As 


negative  controls,  primers  were  designed  against  the  intergenic  region  between  the  TFF-1 
promoter  and  enhancer  and  against  a  region  7  kb  from  XBP-1  enhancer  3.  Neither  ER  nor 
any  of  the  other  factors  was  found  associated  with  these  control  regions.  In  addition,  we 
examined  the  promoter  of  XBP-1.  Although  ER  protein  association  was  not  observed  at 
the  XBP-1  promoter,  RNA  PolII  was  found  enriched  at  this  site  supporting  the  hypothesis 
that  XBP-1  is  transcriptionally  activated  by  ER. 

To  explore  the  possibility  that  the  distal  enhancer  regions  not  only  function  as  sites  of 
protein  recruitment,  but  physically  play  a  role  during  transcription  of  the  adjacent  gene, 
we  performed  a  chromosome  capture  assay  (Dekker  et  al.,  2002)  to  assess  whether 
promoter  and  enhancer  sequence  were  components  of  the  same  chromatin  regions. 
Hormone  depleted  MCF-7  cells  were  stimulated  with  vehicle  or  estrogen  and  the  fixed 
chromatin  was  digested  with  a  specific  restriction  enzyme  (Btgl),  followed  by  ER  ChIP 
and  ligation.  After  ligation,  the  ligated  chromatin  mix  was  washed  and  the  cross-linking 
was  reversed.  One  primer  in  the  TFF- 1  promoter  and  one  primer  in  the  TFF- 1  enhancer 
were  used  to  PCR  potentially  ligated  fragments  of  DNA  (Horike  et  al.,  2005).  As  seen  in 
Fig  3A,  TFF-1  promoter  and  enhancer  DNA  was  ligated  together  only  in  the  presence  of 
estrogen,  confirming  that  estrogen-mediated  transcription  of  TFF-1  involves  direct 
physical  interaction  between  the  enhancer  and  promoter.  No  interaction  was  seen  in  the 
no  digestion  control  or  no  ligation  control.  We  performed  the  same  experiment  using  the 
BsmI  restriction  enzyme  that  cuts  the  genuine  NRIP-1  promoter  (as  determined  by  5’ 
RACE)  and  enhancer  3  region.  Remarkably,  after  ligation,  we  were  able  to  PCR  a  1  kb 
fragment  that  corresponded  to  the  annealed  promoter-enhancer  regions  using  one 


promoter  specific  and  one  enhancer  specific  primer  (Fig  3B).  This  estrogen-dependent 
interaction  of  the  distal  (144  kb)  ER-binding  site  with  the  promoter  of  the  NRIP-1  gene 
confirms  the  authenticity  of  these  distal  sites  as  transcriptional  regulatory  domains. 

The  finding  that  RNA  PolII  is  recruited  to  the  majority  of  ER  binding  sites  even  those 
removed  from  known  transcription  sites  led  us  to  investigate  the  possibility  that  these 
binding  sites  can  function  as  genuine  enhancers.  To  this  end,  we  cloned  23  ER  sites 
(40%  of  all  ER  binding  sites)  into  a  pGL-3  luciferase  vector  containing  an  SV40 
promoter  and  transfected  these  vectors  into  hormone  depleted  MCF-7  cells  which  where 
subsequently  treated  with  estrogen  or  vehicle  control.  PGL3  empty  vector  was  used  as  a 
negative  control  and  transfections  were  normalized  with  pRL-Null.  Almost  75%  of  the 
ER  binding  domains  contained  estrogen-induced  enhancer  characteristics  in  an  in  vitro 
transcription  model  (Fig  3C),  supporting  the  hypothesis  that  the  distal  binding  sites  play 
transcriptional  regulatory  roles. 

ER  binding  sites  are  conserved  across  species 

To  identify  if  the  ER  binding  sites  are  conserved  between  human  and  mouse  genomes, 
we  assessed  the  identity  in  sequence  in  a  window  of  6  kb  from  the  center  of  all  57  ER 
binding  sites.  This  conservation  was  mapped  within  a  500  bp  window  at  a  single 
nucleotide  resolution  and  confirms  a  strong  conservation  at  the  center  of  the  ER  binding 
site  and  the  500  bp  on  either  side  of  the  middle  of  the  peak  (Fig  4A).  However, 
conservation  decreased  to  background  levels  at  a  distance  of  1  kb  or  more  from  the  center 
of  the  ER  binding  sites.  This  supports  the  hypothesis  that  the  discrete  ER  binding  sites 


we  see  in  MCF-7  cells  are  conserved  between  species  and  likely  play  a  more  general  role 
in  ER  action  in  other  cellular  systems. 

A  screen  for  common  sequences  enriched  in  genuine  ER-binding  regions  suggests 
the  importance  of  Forkhead  factors  in  estrogen  action 

An  unbiased  search  for  common  sequence  motifs  (Liu  et  al.,  2002)  within  the  57 
individual  ER-binding  sites  on  chromosomes  21  and  22  revealed  the  significant 
recurrence  of  two  motifs.  A  consensus  15  base  sequence  was  present  in  49%  of  all  the  ER 
binding  sites  on  chromosomes  21  and  22  (Fig  4B)  and  is  identical  to  the  canonical  ERE 
(Klinge,  2001).  The  likelihood  of  an  ERE  occurring  in  one  of  the  ER  binding  sites  was 
very  significantly  increased  when  compared  to  all  of  chromosomes  21  and  22  (p  =  1.33E- 
15).  In  addition,  in  the  ER  binding  sites  lacking  a  canonical  ERE,  a  majority  were  found 
to  contain  one  or  more  ERE  half-sites.  The  occurrence  of  ERE  half-sites  was  also  non- 
random  (p  =  2.16E-14).  In  order  to  confirm  that  our  failure  to  find  ER  binding  at  other 
EREs  (5,500  predicted  EREs  on  chromosomes  21  and  22,  as  listed  in  Fig  1A  and  IB)  was 
not  due  to  the  insensitivity  of  the  ChIP-microarray  technique,  we  performed  ChIP  for  ER 
followed  by  PCR  for  several  randomly  selected,  predicted  but  non-functional  perfect 
EREs  on  chromosomes  21  and  22.  No  ER  association  was  found  at  any  of  these  sites 
(data  not  shown). 

We  next  determined  whether  DNA  sequences  other  than  the  classical  ERE  were  found  at 
the  ER  binding  sites  by  analyzing  the  bound  sequences  for  conserved  motifs  after 
removing  the  EREs.  This  analysis  revealed  the  presence  of  a  Forkhead  factor  binding  site 


in  56%  of  the  57  ER  binding  regions  (Fig  4B),  a  finding  that  would  only  occur  by  chance 
with  a  probability  of  p  =  1.23E-8.  Forkhead  binding  motifs  were  found  in  64%  of  the  ER- 
binding  regions  that  contain  a  canonical  ERE.  Using  the  consensus  Forkhead  motif 
recurring  within  these  regions  (Fig  4B),  we  determined  the  probability  of  this  motif 
residing  within  predicted  ERE  regions  that  are  not  bound  by  ER  in  vivo  (18.45%).  This 
significant  enrichment  of  a  Forkhead  motif  within  ER  binding  regions  (p  =  3.78E-7) 
suggests  the  presence  of  adjacent  Forkhead  motifs  may  play  a  role  in  determining  ER 
binding.  The  proportion  of  the  57  ER  binding  sites  containing  EREs  or  Forkhead  motifs 
is  presented  in  Fig  4C.  The  finding  that  the  largest  category  of  sites  contains  both  an  ERE 
and  a  Forkhead  motif  (47.4%)  strongly  suggests  a  functional  interaction. 

Forkhead  proteins  play  a  combinatorial  and  essential  role  in  ER  binding  and  ER- 
mediated  gene  transcription 

A  combinatorial  interaction  between  Forkhead  and  ER  pathways  has  been  previously 
suggested  for  a  small  number  of  specific  genes.  HNF-3a  (FoxAl)  Forkhead  binding 
domains  within  the  promoter  of  the  estrogen-regulated  genes  TFF-1  (Beck  et  al.,  1999) 
and  Vitellogenin  B1  (Robyr  et  al.,  2000)  have  been  shown  to  be  important  for  gene 
transcription.  Forkhead  proteins  have  been  shown  to  interact  with  the  ER  protein  in  yeast 
two  hybrid  experiments  (Schuur  et  al.,  2001).  The  function  of  Forkhead  proteins  can  be 
regulated  by  their  nuclear-cytoplasmic  distribution  depending  on  their  phosphorylation 
(Brunet  et  al.,  1999;  Kops  et  al.,  1999).  We  therefore  assessed  the  localization  of  the 
Forkhead  factor,  FoxAl,  after  vehicle  or  estrogen  stimulation  of  MCF-7  cells.  FoxAl 


protein  predominantly  resides  in  the  nucleus  in  both  vehicle  and  estrogen  treated  cells 
(data  not  shown). 

We  next  determined  whether  FoxAl  was  recruited  along  with  ER  to  the  ER-binding 
domains.  ChIP  of  FoxAl,  followed  by  real-time  PCR  of  all  57  ER  binding  regions  on 
chromosomes  21  and  22  revealed  a  high  degree  of  concordance  between  regions  that 
recruit  ER  and  FoxAl.  Approximately  48%  of  all  of  the  ER  binding  domains  showed 
FoxAl  interaction,  although  the  pattern  of  recruitment  differed  from  site  to  site.  A 
majority  of  the  regions  containing  FoxAl,  did  so  in  the  absence  of  estrogen,  but  FoxAl 
binding  was  decreased  following  estrogen  stimulation.  This  was  the  case  for  NRIP-1 
enhancer  1,  DSCAM-1  enhancer  1  and  TFF-1  promoter  (Fig  5 A).  FoxAl  association 
with  XBP-1  enhancer  2  was  clearly  observed,  but  was  not  diminished  after  estrogen 
addition  (Fig  5A).  All  of  these  ER  binding  sites  contained  a  Forkhead  motif  and  an  ERE 
or  ERE  half  site  (Fig  5B).  FoxAl  was  not  seen  to  bind  to  XBP-1  enhancer  3,  which  lacks 
a  Forkhead  motif  (Fig  5).  However,  several  regions  containing  Forkhead  motifs  did  not 
recruit  FoxAl  and  several  ER  binding  domains  that  lacked  Forkhead  motifs  did  bind 
FoxAl.  This  complex  interplay  between  FoxAl,  ER  and  binding  sites  within  chromatin 
likely  involves  adjacent  regions  to  the  ER  binding  sites  and  may  involve  other  proteins. 
Despite  this,  it  is  clear  that  a  significant  proportion  of  ER  binding  sites,  especially  those 
adjacent  to  actively  transcribed  genes  contain  FoxAl  prior  estrogen  stimulation  and  ER 


recruitment  to  the  same  regions. 


To  determine  the  importance  of  FoxAl  in  mediating  ER  association  with  chromatin,  we 
developed  siRNA  to  the  3’UTR  of  FoxAl  mRNA.  Specific  targeted  knockdown  of 
FoxAl  protein  was  achieved  (Fig  6A),  without  changes  in  control  protein  or  ER  protein 
levels  (data  not  shown).  A  luciferase  siRNA  (siLuc)  was  used  as  a  negative  control. 
MCF-7  cells  were  deprived  of  hormones  for  24  hr  and  siLuc,  or  siRNA  to  FoxAl  was 
transfected  for  6  hr,  after  which  hormone  depleted  media  was  added  for  a  further  48  hr 
and  cells  were  stimulated  with  estrogen  or  vehicle.  ER  ChIP  and  real  time  PCR  of  a 
number  of  previously  validated  binding  sites  was  performed.  The  decrease  in  FoxAl 
completely  impeded  the  ability  of  ER  to  bind  to  TFF-1  promoter,  XBP-1  enhancer  1  and 
NRIP-1  enhancer  2  (Fig  6B),  as  well  as  DSCAM-1  enhancer  1  (data  not  shown).  No 
changes  were  observed  on  the  XBP-1  promoter,  which  functioned  as  a  negative  control 
(Fig  6B). 

Since  the  targeted  knockdown  of  FoxAl  inhibited  the  ability  of  ER  to  associate  with  in 
vivo  ER  binding  sites,  we  assessed  the  effect  of  Forkhead  down-regulation  on  estrogen- 
mediated  transcription.  After  siLuc  or  siFoxAl  transfection,  cells  were  stimulated  with 
estrogen  or  vehicle  for  6  hr  and  mRNA  changes  in  all  12  estrogen  target  genes  on 
chromosomes  21  and  22  were  assessed.  The  estrogen-induced  increases  in  all  12  estrogen 
targets  were  abolished  when  FoxAl  was  down-regulated  (Fig  6C),  but  no  changes  were 
observed  in  GAPDH  control  mRNA  levels.  The  essential  role  for  the  FoxAl  Forkhead 
protein  during  transcription  of  all  estrogen  target  genes  on  chromosomes  21  and  22, 
confirms  a  general  requirement  of  FoxAl  for  ER  transcription. 


Discussion 


A  complete  picture  of  ER-mediated  gene  activation  has  begun  to  emerge  in  recent  years, 
with  a  coordinated  and  timely  cycling  of  receptor,  nuclear  coactivators,  chromatin 
remodelling  proteins  and  the  transcription  machinery  on  and  off  target  promoters 
(Metivier  et  al.,  2003;  Shang  et  al.,  2000).  However,  these  studies  oversimplify  the 
problem  by  focusing  on  the  promoter  proximal  region  of  one  or  two  target  genes  and 
largely  ignore  the  remaining  chromosomal  sequence.  Here  we  have  interrogated  the 
association  of  ER  across  entire  chromosomes,  including  intergenic  regions  that  contain 
potential  cis-regulatory  domains.  These  ChIP-microarray  experiments  demonstrate  the 
ability  to  identify  genuine  in  vivo  ER  protein  binding  sites  in  previously  unexplored 
regions  of  the  genome.  Interestingly,  while  a  few  of  the  ER  binding  sites  were  found 
directly  adjacent  to  ER  target  genes,  most  were  found  at  significant  distances  including 
several  >100  kb  removed  from  transcription  start  sites.  Of  the  57  ER  binding  sites  (within 
32  potential  transcriptional  regulatory  clusters),  only  a  very  small  number  of  proximal 
promoters  recruited  ER,  despite  the  fact  that  other  genes  were  estrogen  induced.  The 
presence  of  multiple  components  of  the  transcriptional  machinery  at  the  distal  sites  and 
the  ability  of  chromosome  conformation  capture  assays  to  demonstrate  that  these  distant 
sites  are  physically  associated  with  promoter-proximal  regions  suggests  that  they  play  an 
important  role  in  estrogen-mediated  regulation. 

A  significant  volume  of  work  has  focused  on  identifying  essential  domains  within  the 
proximal  promoters  of  known  estrogen  regulated  genes  using  in  vitro  methods  (Dubik 
and  Shiu,  1992;  Petz  et  al.,  2002;  Porter  et  al.,  1996;  Teng  et  al.,  1992;  Umayahara  et  al., 


1994;  Vyhlidal  et  al.,  2000;  Weisz  and  Rosales,  1990).  The  conclusions  drawn  from  this 
large  volume  of  data  implicate  a  number  of  motifs,  including  Spl,  AP-1  and  GC  rich 
regions  as  important  cis-regulatory  domains  in  ER-mediated  transcription.  However,  our 
data  demonstrate  ER  regulatory  sites  at  distances  several  orders  of  magnitude  greater  than 
was  focused  on  in  the  past,  suggesting  that  they  may  function  in  ways  analogous  to  the  |3- 
globin  LCR  which  has  its  major  effect  subsequent  to  PIC  formation  (Sawado  et  al., 
2003). 

Non-biased  motif  scanning  of  the  genuine  in  vivo  ER  binding  sites  identified  a  canonical 
estrogen  responsive  element  (ERE)  in  the  majority  of  ER  binding  sites  that  represented 
only  1.5%  of  EREs  predicted  by  bioinformatics  alone.  Previous  approaches  for  motif 
identification  involved  computational  based  methods  for  identifying  response  elements, 
after  which  gene  proximal  sites  are  included  as  potential  binding  domains  (Bajic  and 
Seah,  2003;  Bourdeau  et  al.,  2004).  The  current  data  suggest  that  while  ER  binding 
involves  interaction  with  consensus  ERE  motifs,  the  presence  of  such  motifs  is 
insufficient  to  dictate  receptor-chromatin  association.  Furthermore,  the  exclusion  of 
response  elements  further  than  several  kilobases  from  transcription  start  sites  eliminates 
distal  regulatory  regions  that  may  be  the  primary  receptor-chromatin  interaction  sites. 

Since  the  presence  of  an  ERE  alone  is  insufficient  to  define  an  authentic  ER  regulatory 
site,  we  searched  for  other  conserved  sequences  and  found  that  Forkhead  factor  binding 
sites  are  present  near  authentic  EREs  significantly  more  frequently  than  those  that  do  not 
bind  ER.  We  showed  that  a  Forkhead  factor  (FoxAl)  binding  was  essential  for  ER- 


chromatin  interactions  and  subsequent  expression  of  estrogen  gene  targets.  FoxAl 
protein  can  bind  condensed  chromatin  via  its  winged-helix  DNA  binding  domains  that 
mimic  histone  linker  proteins  (Cirillo  et  al.,  2002;  Cirillo  et  al.,  1998).  Unlike  histone 
proteins  however,  FoxAl  does  not  contain  the  amino  acid  composition  to  condense 
chromatin  and  it  therefore  thought  to  promoter  euchromatic  conditions.  As  such,  it  is 
possible  that  the  presence  of  FoxAl  identifies  specific  regions  within  chromatin  to 
facilitate  the  association  of  the  ER  transcription  complex.  Our  data  suggest  that  FoxAl  is 
present  on  the  chromatin  at  a  number  of  regions,  after  which  ER  can  associate  with  these 
specific  sites.  Down-regulation  of  FoxAl  inhibits  the  ability  of  ER  to  associate  with  its 
binding  sites,  confirming  the  requirement  for  Forkhead  directed  association  of  ER  with 
chromatin,  despite  the  fact  that  these  sites  contain  sufficient  information,  in  the  form  of 
an  ERE,  for  ER  docking.  A  recent  investigation  has  shown  that  FoxAl  can  directly 
modulate  chromatin  in  the  MMTV  promoter  and  can  positively  enhance  transcription  by 
the  Glucocorticoid  Receptor  (Holmqvist  et  al.,  2005),  supporting  a  general  model  for 
FoxAl  involvement  in  nuclear  receptor  transcription. 

We  have  taken  an  unbiased  approach  to  identify  regions  of  chromatin,  both  promoter 
proximal  and  intergenic  sequences,  that  are  involved  in  ER-mediated  transcriptional 
activity.  We  find  a  limited  number  of  bone  fide  ER  binding  sites  on  chromosomes  21  and 
22,  with  a  significant  enrichment  of  canonical  ERE  palindromes  and  half  sites  within  the 
binding  sites.  Moreover,  the  presence  of  Forkhead  binding  motifs  and  the  subsequent 
identification  of  a  functional  role  for  Forkhead  proteins,  exemplifies  the  power  of  this 


methodological  approach  to  identify  important  regulatory  domains  within  the  vast  regions 
of  unexplored  sequence  in  the  human  genome. 

Materials  and  Methods 

Chromatin  Immunoprecipitation  (ChlP)-microarray  preparation 

ChIP  was  performed  as  previously  described  (Shang  et  al.,  2000),  with  the  following 
modifications.  2  pg  of  antibody  was  prebound  for  a  minimum  of  4  hr  to  protein  A  and 
protein  G  Dynal  magnetic  beads  (Dynal  Biotech,  Norway)  and  washed  three  times  with 
ice-cold  PBS  plus  5%  BSA,  and  then  added  to  the  diluted  chromatin  and 
immunoprecipitated  overnight.  The  magnetic  bead-chromatin  complexes  were  collected 
and  washed  6  times  in  RIPA  buffer  (50  mM  HEPES  pH  7.6,  1  mM  EDTA,  0.7%  Na 
deoxycholate,  1%  NP-40,  0.5  M  LiCl).  Elution  of  the  DNA  from  the  beads  was  as 
previously  described  (Shang  et  al.,  2000).  Antibodies  used  were:  ERa  (Ab-10)  from 
Neomarkers  (Lab  Vision,  UK),  ERa  (HC-20),  RNA  PolII  (H-224),  AIB-1/RAC3  (C-20), 
TFIID  (SI1),  HNF-3a/FoxAl  (H-120),  mouse  IgG  (sc-2025)  and  rabbit  IgG  (sc-2027) 
from  Santa  Cruz  (Santa  Cruz  Biotechnologies,  CA).  Ligation-Mediated  PCR  was 
performed  as  previously  described  (Kapranov  et  al.,  2002). 

Data  Analysis 

1,054,325  probe  pairs  were  mapped  to  chromosome  21  and  22  according  to  the  NCBIv33 
GTRANS  Libraries  provided  by  Affymetrix.  (PM-MM)  value  was  recorded  for  each 
probe  pair,  and  a  probe  pair  was  removed  if  either  PM  or  MM  was  flagged  as  outlier  by 


the  Affymetrix  GCOS  software.  The  five  samples  (three  ER+  ChIP  and  three  genomic 
inputs)  were  normalized  by  quantile  normalization  (Bolstad  et  al.,  2003)  based  on  a 
combined  76  ChIP  experiments  obtained  from  public  domain  and  Dana-Farber  Cancer 
Institute.  The  behavior  of  every  probe  pair  i,  assumed  to  be  N(fii9of),  was  estimated 

from  the  76  normalized  experiments.  A  two- state  (ChIP-enriched  state  and  non-enriched 
state)  Hidden  Markov  Model  with  the  following  parameters  was  applied  to  each  sample 
to  estimate  the  probability  of  ChIP-enrichment  at  each  probe  pair  location: 

Transition  probabilities:  — —  for  transition  to  a  different  state 

v  1,054,325 

,  300  .  .  , 

1  -  — - for  staying  in  the  same  state 

1,054,325 

Emission  probabilities:  N(fj.naf)  for  non-enriched  hidden  state 

N( jU,.  +  2cr,,(1.5cr  )2)  for  enriched  hidden  state 

To  combine  the  results  from  the  six  samples,  an  enrichment  score  was  calculated  as  the 
average  enrichment  probability  in  the  three  ER+  ChIP  samples  subtracted  by  the  average 
enrichment  probability  in  the  two  genomic  input  samples.  Since  the  tiling  array  has  one 
25-mer  probe  in  every  35  bp  of  non-repeat  regions,  the  coverage  of  every  probe  was 
extended  by  10-bp  on  both  ends.  An  enriched  regions  is  defined  as  run  of  probes  with 
enrichment  score  >  50%  and  covering  at  least  125  bp.  Each  enriched  region  can  tolerate 
up  to  two  neighboring  probes  with  enrichment  score  between  [10%,  50%].  If  two 
neighboring  probes  are  more  than  210  bp  apart,  the  enriched  region  is  broken  into  two 
separate  blocks.  A  summary  enrichment  score  was  obtain  for  each  enriched  region,  which 


is  the  enrichment  score  summation  for  all  the  probes  in  the  region  divided  by  the  square 
root  of  the  number  of  probes  in  the  region.  This  summary  enrichment  score  represents  the 
relative  confidence  of  a  predicted  enriched  region. 

The  genomic  DNA  of  every  ChIP-enriched  region  was  retrieved  from  UCSC  genome 
browser,  and  ranked  by  the  summary  enrichment  score.  MDscan  algorithm  (Liu  et  al., 
2002)  was  applied  to  the  sequences  to  find  enriched  sequence  pattern  that  is  the  putative 
estrogen  receptor  binding  motif.  To  find  a  motif  of  width  w,  MDscan  first  enumerates 
each  w-mer  in  the  highest  ranking  sequences,  and  collects  other  vr-mers  similar  to  it  in 
these  sequences  to  construct  a  candidate  motif  as  a  probability  matrix.  A  semi-Bayes 
scoring  function  was  used  to  remove  low  scoring  candidate  motifs,  and  refine  the  rest  by 
checking  all  w-mers  in  all  the  ChIP-enriched  sequences.  A  high  scoring  motif  (with 
similar  consensus)  consistently  reported  multiple  times  at  different  motif  widths  indicates 
a  strong  prediction. 

Species  Conservation 

We  expanded  all  57  of  the  ER  binding  sites  equally  in  each  direction  to  have  a  length  6 
kb.  The  human-mouse  conservation  score  of  each  nucleotide  in  the  expanded  binding 
region  is  defined  as  the  average  sequence  identity  (#Matched  Nucleotides-#Indels/500)  of 
a  500-mer  window  centered  at  the  nucleotide.  The  human  (hgl5)  /mouse(mm3)  BLASTZ 
(Schwartz  et  al.,  2003)  genome  alignments  were  downloaded  from 
http://genome.ucsc.edu. 


Real  time  PCR 


Primers  were  selected  using  Primer  Express  (Applied  Biosystems).  5  pi  of  precipitated 
and  purified  DNA  was  subjected  to  PCR  using  the  Applied  Biosystems  SYBR  Green 
Mastermix.  Relative  DNA  quantities  were  measured  using  the  PicoGreen  system 
(Molecular  Probes,  OR).  All  primer  sequences  and  locations  are  listed  in  supplemental 
data  2. 

Double  stranded  cDNA  synthesis 

Total  RNA  was  converted  to  double  stranded  cDNA  according  to  the  In  Vitrogen 
Superscript  Double  stranded  cDNA  synthesis  manufacturer’s  instructions.  The  RNA  was 
primed  with  250  ng  oligo(dT)  (In  Vitrogen)  and  25  ng  random  hexamers  (Gibco).  cDNA 
was  fragmented  and  labelled  as  described  above. 

5’RACE 

5’  RACE  was  performed  according  to  the  manufacturer’s  instructions  (In  Vitrogen).  The 
primers  sequences  used  were:  NRIP-1  RT  primer  (5’-TGCCTGATGCATTAGTAATCC- 
3’),  NRIP-1  nested  primer  1  (5 ’ -G AGCC AAGCTCTTCTCC ATG AGTCATGTTC-3 ’ ) 
and  NRIP-1  nested  primer  2  (5 ’ - ACCTTCC ATCGCAATC AGAG AGAGACGTACTG- 
3’).  The  PCR  product  was  cloned  and  sequenced  by  standard  methods. 


Chromosome  capture  assay 

Fixed  chromatin  was  digested  overnight  with  specific  restriction  enzymes  after  which  ER 
ChIP  was  set  up  as  described  above.  After  overnight  ChIP,  the  beads  were  precipitated 
and  resuspended  in  ligation  buffer  (NEB,  MA,  USA)  and  overnight  ligation  was 
performed.  The  beads  were  collected,  washed  and  the  formaldehyde  cross-linking  was 
reversed  as  described  above.  Primers  used  to  amplify  annealed  fragments  were  as 
described  in  supplemental  data  2. 

Luciferase  transcriptional  activity 

ER  binding  sites  were  amplified  by  PCR  and  cloned  into  the  pGL3-promoter  vector 
(Promega).  Hormone  depleted  MCF-7  cells  were  transfected  with  each  of  the  ER  binding 
domain  vectors  with  Lipofectamine  2000  (Invitrogen)  and  total  protein  lysate  was 
harvested  after  estrogen  or  ethanol  addition  for  24  hr.  Transfections  were  normalized  by 
the  co-transfection  of  the  pRL-Null  Renilla  luciferase  vector  and  Renilla  and  Firefly 
luciferase  activity  was  assessed  using  the  dual  luciferase  kit  (Promega). 

Western  blotting 

SDS-PAGE  was  performed  as  previously  described  (Carroll  et  al.,  2000).  Antibodies 
used  were  HNF-3a  (ab5089),  from  AbCam  (Cambridge,  UK)  and  Calnexin  (H-70)  from 
Santa  Cruz  (CA,  USA). 


Short  interfering  (si)  RNA 


A  21  bp  siRNA  was  designed  against  the  FoxAl  transcript  and  synthesized  by 
Dharmacon  (Lafayette,  CO).  siRNA  was  transfected  using  Lipofectamine  2000 
(InVitrogen).  The  siRNA  sequences  used  were:  siFoxAl  sense  5’- 
GAGAGAAAA  AAUC  A  AC  AGC-3  ’  and  antisense  5’-GCUGUUGAUUUUUUCUCUC  - 
3’;  siLuc  sense  5’-  CACUUACGCUGAGUACUUCGA  -3’  and  antisense  5’- 
UCGAAGUACUCAGCGUAAGUG  -3’. 
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Figure  legends 
Fig.  1 

Map  of  ER  binding  sites  on  chromosomes  21  and  22  after  estrogen  stimulation.  The 
visual  representation  of  ER-binding  sites  on  chromosomes  21  (A)  and  22  (B)  are  shown. 
Genes  locations  are  shown  in  blue  bars.  Gene  locations  are  based  on  the  2003  genome 
freeze  in  the  UCSC  browser  using  Genbank  RefSeq  positions.  Predicted  EREs  are  shown 
as  black  bars  and  ER-binding  sites  are  shown  as  red  bars.  (A)  An  expanded  view  of  the 
TFF-1  gene  region  is  shown  as  signal  difference  between  ER  ChIP  and  Input  DNA  for 
both  the  estrogen  and  vehicle  treated  cells.  The  TFF-1  gene  is  shown  in  its  genuine  3’-5’ 
orientation.  The  gene  adjacent  to  TFF-1  is  not  an  estrogen  target.  (B)  Expanded  view  of 


the  XBP-1  gene  region  on  chromosome  22.  The  XBP-1  gene  is  shown  in  its  genuine  3’-5’ 
orientation. 

Fig.  2 

Validation  of  the  in  vivo  binding  of  the  transcription  complex  to  regulatory  regions.  ChIP 
of  ER  and  standard  PCR  of  sites  adjacent  to  TIT-1  (A),  XBP-1  (B),  DSCAM-1  (C), 
NRIP-1  (D)  and  SOD-1  (E).  TFF-1  non-specific  and  XBP-1  promoter  primers  were 
included  as  negative  controls.  The  lanes  are  vehicle  (V),  estrogen  (E)  and  Input  (I).  (F) 
ChIP  of  ER,  RNA  PolII,  AIB-1  or  IgG  control  and  real-time  PCR  of  the  enhancer 
regions.  The  data  are  estrogen-mediated  fold  enrichment  compared  to  vehicle  (ethanol) 
control  and  are  the  average  of  three  separate  replicates.  The  color  intensity  reflects  the 
fold  change  as  described  in  the  legend.  TFF-1  non-specific  and  XBP-1  non-specific 
primers  were  included  as  negative  controls. 

Table  1 

List  of  ER  binding  site  clusters  and  relative  locations  to  putative  gene  targets.  The  32 
transcriptional  clusters  are  shown,  with  the  start  and  stop  locations  of  the  ER  binding 
sites. 

Fig.  3 

Interaction  of  promoter-enhancer  domains  and  transcriptional  activity  of  enhancer 
regions.  (A)  Chromosome  capture  assay  was  performed  after  digesting  fixed  chromatin 
from  vehicle  or  estrogen  treated  cells  with  the  Btgl  restriction  enzyme.  Primers  flanking 


the  TFF-1  promoter  and  enhancer  were  used  to  amplify  DNA  after  ligation.  Undigested 
controls  and  no  ligase  controls  were  included.  (B)  Chromatin  was  digested  with  BsmI  and 
one  primer  flanking  the  NRIP-1  promoter  and  one  in  enhancer  3  region  were  used  to 
amplify  a  specific  product  after  ligation.  (C)  ER  binding  sites  were  cloned  into  the  pGL-3 
promoter  vector  and  transfected  into  hormone  depleted  MCF-7  cells,  after  which  vehicle 
or  estrogen  was  added.  Empty  pGL3-promoter  vector  was  used  as  a  negative  control. 
Co-transfection  of  pRL-Null  Renilla  vector  was  included  as  a  normalizing  control.  The 
open  bars  represent  vehicle  treated  and  black  bars  represent  estrogen  treatment. 

Fig.  4 

Conservation  of  ER  binding  sites  and  presence  of  enriched  motifs.  (A)  Sequence 
homology  of  ER  binding  sites  and  surrounding  sequence  between  human  and  mouse 
genomes.  The  center  of  ER  peaks  is  designated  co-ordinate  0.  (B)  An  unbiased  motif 
screen  of  all  the  ER  binding  sites  on  chromosomes  21  and  22  revealed  the  presence  of 
two  enriched  motifs,  an  ERE  and  a  Forkhead  binding  motif,  both  of  which  are  visually 
represented  in  WebLogo  (http://weblogo.berkeley.edu).  (C)  The  occurrence  of  ERE  or 
ERE  half  sites  and  Forkhead  sites  within  the  57  ER  binding  sites  on  chromosomes  21  and 
22. 


Fig.  5 

Recruitment  of  Forkhead  protein  to  ER-binding  domains.  (A)  ChIP  of  FoxAl  followed 
by  real-time  PCR  of  NRIP-1  enhancer  1,  DSCAM-1  enhancer  1,  TFF-1  promoter  and 


XBP-1  enhancer  1.  XBP-1  enhancer  2  is  included  as  a  control  which  does  not  recruit 


FoxAl.  Data  is  shown  as  fold  change  versus  input.  (B)  Schematic  diagram  showing  the 
relative  location  of  ERE  motifs  (inverted  green  arrows),  ERE  half-sites  (blue  arrows)  and 
Forkhead  motifs  (red  arrows).  Chromosome  nucleotide  locations  are  given. 


Fig.  6 

Specific  targeted  knockdown  of  FoxAl  protein  and  the  effects  on  estrogen-mediated 
transcription.  (A)  siRNA  to  FoxAl  was  transfected  into  hormone  depleted  MCF-7  cells 
and  changes  in  protein  levels  were  determined  after  vehicle  or  estrogen  treatment.  SiLuc 
was  used  as  a  transfection  control  and  Calnexin  was  used  as  a  loading  control.  (B)  ER 
ChIP  was  performed  after  vehicle  or  estrogen  treatment  of  siLuc  or  siFoxAl  transfected 
cells  and  real  time  PCR  was  conducted  on  TFF-1  promoter,  XBP-1  enhancer  1,  NRIP-1 
enhancer  2  and  XBP-1  enhancer  2  as  a  negative  control.  The  data  are  fold  enrichment 
over  vehicle  treated.  (C)  Changes  in  mRNA  levels  of  all  estrogen-regulated  genes  on 
chromosomes  21  and  22.  The  data  are  estrogen-mediated  fold  enrichment  compared  to 
vehicle  (ethanol)  control  and  are  the  average  of  three  separate  replicates.  The  color 
intensity  reflects  the  fold  change  as  described  in  the  legend. 
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Background 

Transcription  by  RNA  Polymerase  II  is  regulated  at  many  steps  including  initiation, 
promoter  release,  elongation  and  termination.  Accumulation  of  RNA  Polymerase  II  at 
particular  locations  across  genes  can  be  indicative  of  sites  of  regulation.  RNA  Polymerase  II 
is  thought  to  accumulate  at  the  promoter  and  at  sites  of  co-transcriptional  alternative  splicing 
where  the  rate  of  RNA  synthesis  slows. 

Results 

In  order  to  further  understand  transcriptional  regulation  at  a  global  level,  we  determined  the 
distribution  of  RNA  Polymerase  II  within  regions  of  the  human  genome  designated  by  the 
ENCODE  project.  Hypophosphorylated  RNA  Polymerase  II  localizes  almost  exclusively  to 
5'  ends  of  genes.  On  the  other  hand,  localization  of  total  RNA  Polymerase  II  reveals  a 
variety  of  distinct  landscapes  across  many  genes  with  74%  of  the  observed  enriched 
locations  at  exons.  RNA  Polymerase  II  accumulates  at  many  annotated  constitutively 
spliced  exons,  but  is  biased  for  alternatively  spliced  exons.  Finally,  RNA  Polymerase  II  is 
also  observed  at  locations  not  in  gene  regions. 

Conclusions 

Localizing  RNA  Polymerase  II  across  many  millions  of  base  pairs  in  the  human  genome 
identifies  novel  sites  of  transcription  and  provides  insights  into  the  regulation  of 
transcription  elongation.  These  data  indicate  that  RNA  Polymerase  II  accumulates  most 
often  at  exons  during  transcription.  Thus,  a  major  factor  of  transcription  elongation  control 
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in  mammalian  cells  is  the  coordination  of  transcription  and  pre-mRNA  processing  to  define 
exons. 

Results  and  Discussion 

Transcriptional  and  post-transcriptional  regulation  of  gene  expression  intersect  at 
RNA  Polymerase  II.  The  rate  of  RNA  Polymerase  II  movement  is  altered  by  loading  of 
transcription  factors  at  the  promoter,  chromatin  structure,  pre-mRNA  processing,  elongation 
control  and  termination  [1-3].  Thus,  RNA  Polymerase  II  accumulates  at  promoters  as  well 
as  at  different  locations  across  a  particular  gene  [4],  but  the  general  patterns  across  many 
different  genes  have  yet  to  be  explored.  Numerous  factors  such  as  histones,  post-translation 
modifying  enzymes,  and  RNA  binding  proteins  regulate  these  processes  [1,3].  One  key 
determinant  of  transcription  is  the  phosphorylation  state  of  PolII’s  C-terminal  domain  (CTD) 
[5,  6]  which  becomes  hyperphosphorylated  during  transcription  elongation  [4,  6-9].  Much 
of  our  understanding  of  transcription  elongation  comes  from  work  in  prokaryotes  and  yeast 
where  most  genes  are  intronless  [1,3].  Transcription  and  pre-mRNA  processing  are 
coordinated  as  the  two  processes  affect  the  efficiency  of  each  other  [2,  10].  The  spatial 
patterns  of  the  different  phosphorylation  states  of  RNA  Polymerase  II  across  genes  remains 
poorly  understood  in  mammalian  systems. 

To  explore  the  range  of  locations  where  RNA  Polymerase  II  accumulates  across  the 
genome,  we  performed  chromatin  immunoprecipitation  (ChIP)  from  HeLa  S3  cells,  and 
profiled  the  purified  DNA  using  an  oligonucleotide  tiled  microarray  interrogating  the 
ENCODE  regions  [11]  covering  471  known  genes.  Two  antibodies  were  utilized,  8WG16 
and  4H8,  which  recognize  the  hypophosphorylated  (PolIIa)  or  a  phosphorylation 
independent  state  of  the  CTD  of  RNA  Polymerase  II  (PolII),  respectively.  Thus,  the  4H8 
antibody  is  recognizing  the  total  RNA  Polymerase  II  population.  Isolated  DNA  was 
amplified  using  a  multiple  displacement  amplification  (MDA)  strategy  (see  Methods)  [12]. 
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To  identify  sites  of  enrichment,  we  used  a  non-parametric  approach  generalizing  the 
Wilcoxon  signed-rank  test  [13].  Signals  across  1000  nucleotides  were  used  to  determine  a 
p-value  for  each  probe.  Probes  were  filtered  for  uniqueness  within  the  bandwidth.  Probes 
with  p- values  below  10'4  were  selected  for  further  analysis  because  this  threshold  has  a  low 
false  positive  rate  as  determined  by  PCR  analysis  (Figure  1).  With  these  parameters,  the 
hypophosphorylated  specific  PolII  antibody  reveals  102  occupied  sites  whereas  the 
phosphorylation  independent  antibody  shows  550  sites  (Table  1). 

RNA  Polymerase  II  has  distinct  landscapes  across  each  gene.  Figure  2  shows 
representative  genes  with  PolII  enrichments.  PolIIa  is  highly  enriched  at  transcription 
initiation  sites.  On  the  other  hand,  PolII  shows  gene-specific  landscapes  with  the  strongest 
enrichments  at  exons  within  actively  transcribed  loci.  Active  genes  reveal  lower  p-values 
across  the  gene  compared  to  intergenic  or  inactive  genes  (Compare  Figure  1A  and  IB) 
indicating  a  relative  absence  of  RNA  PolII  from  the  nontranscribed  regions.  Some  smaller 
genes  with  high  exon  density,  such  as  SF1,  reveal  significant  polymerase  signal  across 
almost  the  entire  locus  (Fig2A).  Distinct  accumulations  are  observed  with  significant  p- 
values  around  exons  for  both  SF1  and  KIAA1932.  In  the  KIAA1932  gene,  RNA  Pol  II  is 
enriched  at  a  subset  of  constitutively  and  alternatively  spliced  exons  (Figure  2C).  For  some 
genes,  RNA  Pol  II  is  enriched  at  relatively  few  locations  within  the  gene  (Figure  2D). 

An  important  question  is  to  determine  if  the  RNA  Polymerase  II  sites  are  indicative 
of  active  transcription.  We  addressed  this  in  multiple  ways.  First,  microarray  expression 
profiling  of  the  mRNA  with  Affymetrix  U133  Plus  2  chips  confirms  that  many  of  the  RNA 
Polymerase  II  associated  genes  are  actively  expressed  in  HeLa  cells  as  seen  in  a  plot  of 
mRNA  expression  level  vs.  p-value  in  Figure  3.  Genes  with  significant  RNA  Polymerase  II 
enrichment  are  biased  towards  genes  with  higer  mRNA  levels.  Figure  3  also  shows  that 
some  genes  have  apparently  high  mRNA  levels  but  no  significant  levels  of  PolII  or  PolIIa. 
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This  could  be  due  to  very  low  transcription  levels  but  high  mRNA  stability.  Second,  we 
measured  RNA  from  the  same  HeLa  cells  on  the  ENCODE  tiled  arrays.  We  observe  that 
34%  of  the  PolII  sites  overlap  with  RNA  signal  (compared  to  ~8%  expected  at  random) 
while  50%  of  the  PolII  locations  are  within  lkb  of  some  RNA  signal  (compared  to  13% 
expected  at  random).  Many  sites  where  small  pieces  of  RNA  are  synthesized  such  as  small 
exons  may  be  missed  due  to  the  spacing  of  the  oligonucleotide  probes  and  the  imperfect 

nature  of  the  probes.  Third,  many  of  the  PolII  and  PolIIa  sites  overlap  with  annotated  ESTs 

* 

and  mRNAs.  87%  of  the  PolII  and  88%  of  the  PolIIa  enriched  locations  overlap  with  EST 
regions,  compared  to  31%  and  44%  expected  at  random,  respectively.  Lastly,  reverse 
transcriptase  PCR  checks  of  KIAA1932  and  DKC1  indicate  that  these  genes  are  being 
expressed  (data  not  shown).  These  data  suggest  that  RNA  Polyermase  II  sites  are  biased 
towards  region  of  active  transcription  and  that  determining  sites  of  enrichment  of  RNA 
Polyermase  II  is  an  indicator  of  transcription. 

Levels  of  PolII  enrichment  at  internal  exons  can  vary  between  genes.  To  examine 
whether  these  patterns  are  influenced  by  expression  levels,  two  categories  were  created:  I) 
Genes  with  multiple  PolII  enrichments  at  internal  exons  and  II)  Genes  with  PolII  at  one  or 
zero  internal  exons.  When  compared  to  the  mRNA  levels,  there  is  no  significant  difference 
between  the  two  categories  compared  to  mRNA  levels  suggesting  that  the  number  of  PolII 
sites  across  the  gene  does  not  vary  significantly  with  RNA  levels.  Genes  with  any  PolII 
enrichment  at  internal  exons  are  correlated  with  higher  mRNA  levels  on  the  expression 
array.  This  is  consistent  with  reports  proposing  to  use  PolII  ChIP  to  monitor  gene 
expression  [14].  Therefore,  the  number  of  PolII  sites  at  internal  exons  may  reflect  different 
levels  of  transcription  elongation  control  and  not  just  the  sensitivity  of  the  experiment. 

Distinct  from  the  hypophosphorylation  specific  antibody,  the  phosphorylation 
independent  antibody  reveals  diverse  enrichment  locations  for  RNA  PolII.  In  total,  74%  of 
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the  identified  PolII  locations  are  near  an  annotated  knownGene,  Refseq  or  genscan  exon  as 
summarized  in  Table  1  (See  Supplemental  Table  2  for  list  of  PolII  genscan  exon  locations). 
Unlike  PolIIa,  PolII  sites  are  distributed  between  the  5’  and  3’  ends  of  genes,  with  a  slight 
bias  towards  terminal  exons  over  initiating  exons  (Figure  4).  This  is  likely  reflecting  RNA 
Polymerase  II  stalling  during  the  coupled  processes  of  transcription  termination  and  3 ’end 
processing  [15].  For  some  genes,  significant  PolII  signal  is  observed  >lkb  past  the  terminal 
exon,  which  might  indicate  transcription  of  the  longer  pre-mRNA  before  3’  end  cleavage 
and  polyadenylation  [16].  Figure  5  shows  two  representative  genes  with  significant  RNA 
Polymerase  II  enrichment  past  the  terminal  exon. 

Most  of  the  hypophosphorylated  PolII  locations  at  internal  exons  also  overlap  a 
transcription  initiation  site  as  the  internal  exon  in  question  is  often  the  second  exon  in  the 
gene.  Only  two  enrichment  sites  overlap  with  an  internal  exon  without  also  being  near  the 
first  exon  of  a  transcript.  One  of  these  is  at  a  CpG  island  in  the  MCF2L  gene  and  the  other 
may  be  an  alternative  transcription  initiation  site  as  annotated  in  the  HG17  assembly  at  the 
beginning  of  the  ITGB4BP  gene.  To  classify  the  remaining  sites  within  introns  or  in 
intergenic  regions,  enrichment  sites  were  compared  to  other  gene  databases.  As  summarized 
in  Table  1,  4  PolIIa  sites  are  in  introns,  but  3  of  these  are  within  resolution  of  annotated  or 
predicted  exons  leaving  only  one  location  not  overlapping  an  exon  of  some  kind.  There  are 
28  hypophosphorylated  polymerase  sites  not  in  a  Refseq  gene  region.  After  following  a 
similar  filtering  approach,  only  14  sites  remain  that  are  not  near  a  putative  exon.  Thus,  only 
14%  of  of  PolIIa  enriched  locations  do  not  overlap  with  a  known  exon  or  actively 
transcribed  region.  Supplemental  Table  2  lists  PolIIa  sites  at  predicted  exons  that  are  likely 
newly  identified  transcription  initiation  locations  in  HeLa  cells.  Figure  5  shows  two 
examples  of  RNA  Polymerase  II  and  RNA  signal  at  new  sites  of  transcription.  Based  on  the 
pattern  of  enrichments,  it  is  probable  that  many  of  these  predicted  exons  are  real  and  are 


6 


Brodsky  et  al. 

transcription  initiation  locations  given  the  observed  strong  bias  of  the  8WG16  antibody  for 
transcription  initiation  locations  in  well-annotated  genes. 

In  order  to  determine  the  generality  of  these  observations,  all  RNA  PolII  occupancy 
sites  were  compared  to  the  known  genes  and  Refseq  databases,  version  HG15.  PolIIa  is 

i 

highly  enriched  for  the  first  exons  around  transcription  initiation  sites  (Figure  4) 
representing  77  of  551  known  genes  in  HG16  on  the  array  (See  Supplemental  Tables  1A-1D 
for  the  entire  lists.). 

Elongation  control  is  a  common  transcriptional  regulation  mechanism  believed  to  be 
affecting  a  wide  range  of  functional  gene  classes  [1].  In  particular,  RNA  Polymerase  II 
pausing  has  been  proposed  to  be  associated  with  alternative  splicing,  [2].  To  determine  if 
there  is  a  bias  for  alternative  exons,  we  counted  all  the  annotated  alternatively  spliced  exons 
in  the  knownGene  database  and  determined  the  distribution  of  PolII  enrichment  locations  on 
them.  PolII  is  enriched  at  57%  of  the  annotated  alternatively  spliced  exons  of  the  active 
genes  compared  to  37%  of  annotated  actively  transcribed  constitutively  expressed  exons. 

We  also  examined  the  distribution  of  all  PolII  p-values  on  different  types  of  exons.  Each 
exon  was  mapped  to  the  smallest  p-value  ChIP-enriched  site  that  overlaps  the  exon.  The 
cassette  exons  are  found  to  be  more  significantly  associated  with  smaller  p-values  compared 
to  constitutively  expressed  exons  according  to  the  two-sample  Kolmogorov-Smimov  test 
with  a  two  sided  p-value  <0.0035. 

One  attractive  hypothesis  is  that  sites  of  exon  enrichment  may  reflect  weaker  splice 
sites  where  PolII  stalls  during  splice  site  recognition.  Using  two  different  empirical  methods 
to  estimate  splice  site  strength,  no  significant  differences  are  observed  between  the  exons 
overlapping  PolII  and  those  that  do  not  [17,  18]}.  Alternatively,  some  of  the  annotated 
constitutively  expressed  exons  may  actually  be  subject  to  alternative  splicing  decisions. 
Kampa  et  al.  suggest  that  the  levels  of  alternative  splicing  are  much  higher  than  commonly 
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believed  and  annotated  in  the  human  genome  from  their  examination  of  expression  on  tiled 
arrays  [19].  Consistent  with  these  findings,  RNA  Polymerase  II  sites  may  be  predicting 
which  exons  are  being  co-transcriptionally  alternatively  spliced. 

To  determine  if  there  is  any  pattern  for  the  120  PolII  enrichment  sites  that  are  in 
refSeq  introns,  we  compared  these  sites  to  known  gene,  genscan,  geneid,  and  sgpGene 
databases  and  find  31  within  resolution  of  putative  exons.  Of  the  remaining  89,  57  are  in 
genes  with  PolII  enrichment  sites  that  also  overlap  exons  suggesting  that  they  are  actively 
transcribed  genes.  No  clear  intronic  positional  bias  is  observed. 

In  sum,  we  have  identified  new  sites  of  RNA  Polymerase  II  accumulation  across 
hundreds  of  genes  in  mammalian  cells.  The  large  majority  of  RNA  Polymerase  II  enriched 
locations  are  at  actively  transcribed  exons  with  a  bias  towards  annotated  alternatively 
spliced  exons.  Many  of  the  PolII  sites  at  annotated  constitutive  expressed  exons  may  be 
sites  of  alternative  splicing.  Whatever  the  eventual  splicing  decision,  these  observations 
suggest  that  events  around  exons  slow  transcription  elongation.  A  recent  study  suggests  that 
even  general  splicing  factors  may  slow  elongation  [20].  Stalling  of  RNA  Polymerase  II  near 
exons  may  function  to  slow  RNA  synthesis  in  order  to  wait  for  the  competition  of  myriad 
splicing  signals  to  be  resolved  in  order  to  define  the  exon  [21,  22].  These  ChIP  data  identify 
where  these  states  of  RNA  Polymerase  II  are  localizing  across  the  ENCODE  regions. 

Across  genes,  these  data  are  consistent  with  the  hypothesis  of  transcriptional  pausing 
at  particular  locations.  Alternatively,  it  is  possible  that  RNA  Polymerase  II  is  rearranging 
during  transcription  such  that  the  epitope  is  only  accessible  around  exons.  Thus,  the 
conformation  of  RNA  Polymerase  II  may  be  changing  and  not  the  transcription  rate. 
Nonetheless,  it  is  interesting  that  the  majority  of  observable  elongating  RNA  Polymerase  II 
accumulates  around  exons  suggesting  that  a  major  feature  of  transcription  elongation  control 
is  coupling  to  pre-mRNA  processing. 
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These  observations  differ  from  those  observed  in  intronless  genes  typically  found  in 
prokaryotes  and  yeast  where  a  more  uniform  PolII  enrichment  is  observed  across  genes  [16]. 
What  appears  to  be  conserved  is  PolII  accumulation  in  coding  regions  compared  to  intronic 
regions.  These  data  highlight  the  complexity  and  gene  specific  nature  of  transcription 
regulation  not  only  at  transcription  initiation  and  termination  locations  but  at  specific  exons. 
Together,  these  observations  suggest  that  a  major  feature  of  transcription  elongation  control 
in  mammalian  cells  is  exon  definition.  Thus,  these  data  provide  new  insights  into  the 
coordination  of  transcription  and  pre-mRNA  processing  in  mammalian  cells. 

Materials  and  Methods 

Chromatin  Immunoprecipitation  and  DNA  amplification.  Chromatin 
immunoprecipitations  were  performed  as  described  with  the  following  modifications  [23]. 
HeLa  S3  cells  were  first  crosslinked  with  DMA  (Pierce)  for  10  minutes,  washed  with  PBS 
and  then  crosslinked  with  formaldehyde  for  10  minutes.  Cells  were  collected,  lysed,  and 
chromatin  was  sheared  by  sonication  to  an  average  length  of  1  kb  as  determined  after 
RNAse  treatment  of  the  samples  on  an  agarose  gel.  Chromatin  was  prepared  from  four 
independently  grown  batches  of  cells  and  pooled  to  generate  three  replicate 
immunoprecipitatons  and  six  input  samples.  Briefly,  8WG16  (Covance)  and  4H8  (AbCam) 
antibodies  were  incubated  with  a  50:50  mix  of  Dynal  protein  A/G  beads  >16  hours  at  4°C  in 
PBS  with  5  mg/ml  BSA.  After  washing  in  PBS,  beads  with  bound  antibody  were  incubated 
with  chromatin  from  approximately  2xl07  cells  for  >  16  hours  at  4°C.  Beads  were  washed  8 
times  with  RIPA  buffer  (50  mM  Hepes,  pH  7.6,  1  mM  EDTA,  0.7%  DOC,  1%  IGEPAL,  0.5 
M  LiCl)  before  DNA  was  eluted  at  65°C  in  TE/1  %  SDS.  Crosslinks  were  reversed  by 
incubating  at  65°C  for  >12  hours  followed  by  proteinase  K  treatment,  phenol  extraction  and 
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RNAse  treatment.  Isolated  DNA  was  then  amplified  isothermally  using  random  nonamer 
primers  and  klenow  polymerase  (Invitrogen)  for  >4  hours  yielding  approximately  2  pg  of 
DNA  per  IP.  DNA  was  prepared  and  hybridized  on  Affymetrix  ENCODE  oligonucleotide 
tiled  arrays  using  the  fragmentation,  hybridization,  staining  and  scanning  procedure 
described  by  Kennedy  et  al.[24],  Affymetrix  ENCODE  microarrays  have  an  interrogating 
25mer  oligonucleotide  probes  tiled  every  20  basepairs  on  average.  A  sample  of  chromatin 
was  set  aside  before  immunoprecipitation  and  used  to  represent  the  input  DNA. 

Tiled  array  analysis.  Quantile  normalization  was  used  to  make  the  distribution  of  probe 
intensities  the  same  for  all  arrays  [25].  In  the  case  of  the  Affymetrix  GTRANS  software 
quantile  normalization  is  used  within  treatment  and  control  replicate  sets.  Nonparametric 
methods  based  on  ranks  were  used  to  identify  ChIP-enriched  regions.  These  methods  make 
mild  assumptions  about  the  data  distributions  and  are  insensitive  to  outlying  observations.  A 
p-value  was  calculated  for  every  assay  probe  on  the  array.  The  set  of  probes  used  in  the 
calculation  of  this  p-value  was  defined  by  a  bandwidth  parameter  b.  All  probes  centered  on 
the  chromosome  at  positions  less  than  b  bases  5'  or  3'  of  the  given  probe  position  are  included 
in  this  set. 

The  Wilcoxon  rank  sum  test  [26],  also  known  as  the  Mann- Whitney  U  test,  is  the 
basis  of  the  p-value  statistic  computed  by  the  Affymetrix  GTRANS  software.  The  control 
and  treatment  observation  sets  are,  respectively,  the  sets  of  normalized  control  and 
normalized  treatment  intensities  from  all  replicates  and  all  probes  within  the  bandwidth.  The 
null  hypothesis  is  that  the  treatment  set  mean  is  no  larger  than  that  of  the  control  set. 

To  take  into  account  probe  to  probe  variability  we  used  a  generalization  of  the 
Wilcoxon  signed-rank  test  for  blocked  data.  All  input  and  IP,  normalized,  sign(PM- 
MM)max(l,IPM-MMI)  intensities  interogating  the  same  chromosomal  location  were 
assigned  to  the  same  block.  Aligned  observations  were  derived  by  subtracting  the  median, 
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normalized  intensity  for  a  given  block  from  each  observation  in  that  block.  All  aligned 
observations  within  the  bandwidth  were  ranked.  A  statistic  W  was  defined  as  the  sum  of  the 
ranks  of  the  aligned  IP  observations.  A  p-value  was  derived  from  W,  based  on  the  joint  null 
distribution  of  the  aligned  input  and  EP  ranks.  The  analyses  are  dependent  on  the 
assumption  that  probes  are  independent.  Probes  were  mapped  to  the  genomic  coordinates  to 
ensure  that  no  probe  mapped  to  more  than  one  location  in  any  1000  bp  window  and  no  two 
probes  map  to  the  same  genomic  location. 

RNA  arrays.  RNA  samples  were  isolated  from  HeLa  S3  cells  and  purified  with  trizol 
(Invitrogen)  and  RNAeasy  (Qiagen).  RNA  was  amplified  and  hybridized  to  Affymetrix 
U133  Plus  2  arrays  using  standard  methods.  Three  biological  replicates  were  quantile 
normalized.  Gene  expression  was  indicated  by  the  median  of  PM-MM  values  over  all 
probes.  The  hypothesis  of  difference  in  gene  expression  between  groups  of  genes,  based  on 
median  PM-MM  was  tested  using  the  Wilcoxon  rank  sum  statistic.  For  hybridization  to  the 
ENCODE  tiled  array,  RNA  was  similarly  isolated  and  double  stranded  cDNA  was  generated 
using  Invitrogen  Superscript  cDNA  synthesis  kit.  1-1.5  ug  of  cDNA  was  hybridized  to  the 
tiled  array.  Three  biological  replicates  were  performed  for  each  RNA  array. 

Genomic  Annotation.  Sites  were  determined  to  be  near  a  genomic  annotation  if  they  were 

within  the  apparent  1000  bp  resolution.  Sites  shorter  than  1000  bp  in  length  were  scaled  in 

size  to  include  1000  bp  around  the  center  of  the  site.  Sites  that  were  longer  than  1000  bp 

used  the  data  determined  length  for  its  resolution  size.  Databases  were  downloaded  from  the 

UCSC  Golden  Path  Genome  Browser  and  loaded  into  a  local  MySQL  database.  Exons  were 

compared  and  classified  as  one  or  more  of  the  following:  Start,  Terminal,  Alternatively 

Spliced,  Constitutive  or  Cassette.  Because  the  arrays  were  designed  using  the  HG15 

assembly,  the  data  were  compared  to  this  version  of  the  human  genome  unless  otherwise 

noted.  The  active  gene  list  was  defined  as  those  with  PolIIa  at  the  first  exon  of  the  gene. 
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Real-time  PCR  PCR  primer  pairs  were  designed  to  amplify  100  bp  fragments  from 
selected  genomic  regions  (Supplemental  Table  8).  Each  real-time  PCR  reaction  contained 
50  nM  primers,  ~lng  of  DNA  and  lx  ABI SYBR  PCR  Reaction  Mix.  A  fluorescence  value 
proportional  to  the  initial  quantity  of  target  DNA  was  calculated  by  a  log-linear  regression 
analysis  for  each  quadruplicate  amplification  curve  [27].  We  normalized  this  value  to  an 
input  chromatin  sample,  then  normalized  this  ratio  to  a  reference  gene,  PAPT  which  is  not 
expressed  in  HeLa  cells,  to  calculate  a  relative  enrichment  value  for  the  target 
((Targetip)/(Target,np))/((PAPTIp)/(PAPTInput)). 

Supplemental  Data 

1 .  PolIIa  annotated  to  Refseq. 

2.  PolIIa  annotated  to  known  genes. 

3.  PolII  annotated  to  Refseq. 

4.  PolII  annotated  to  known  genes. 

5.  PolII  annotated  to  genscan  exons. 

6.  knownGene  and  Refseq  populations  on  the  ENCODE  array. 

7.  The  PolIIa  defined  active  gene  list. 

8.  PCR  primer  list  and  annotation. 

Figure  Legends 

Figure  1.  Enrichment  of  Selected  Genomic  Regions  in  A)  PolII  ChIP  B)  PolIIa  ChIP. 
Real-time  PCR  relative  enrichment  ratios  for  selected  regions  are  found  to  be  enriched  more 
often  with  p- values  below  10'4.  These  regions  include  both  intra-  and  intergenic  locations  as 
listed  in  Supplemental  Table  8. 
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Figure  2.  RNA  Polymerase  II  shows  a  variety  of  gene  specific  enrichment  patterns.  Graphs 
plot  -101og(p- value)  mapped  to  chromosome  position  with  the  significant  p-values  greater 
than  40  indicated  by  the  rectangle  blocks  below  the  graph.  Values  are  plotted  at  every  probe 
location.  Flat  lines  indicate  weak  p-values  and  gaps  indicate  the  absence  of  probes.  The 
high  density  of  probes  across  these  genes  suggest  that  the  observed  patterns  are  not  due  to 
probe  bias.  A  scale  bar  is  shown  for  each  panel  to  reflect  the  different  gene  lengths 
displayed.  RefSeq  genes  and  known  genes  are  annotated  in  green  and  blues,  respectively, 
with  thick  bars  representing  exons  and  thin  lines  introns.  Genes  above  the  white  bar  are 
ordered  5 '-3'  while  those  below  the  white  bar  are  3 '-5'.  A.  On  the  highly  expressed  SF1 
gene,  PolIIa  localizes  to  the  first  exon  only.  PolII  accumulates  across  the  gene  with  a 
distinctive  pattern.  B.  No  significant  signal  is  observed  across  the  inactive  NRXN2  locus 
that  is  nearby  SF1  on  chromosome  11.  Graphs  are  plotted  on  the  same  scale  as  seen  in  A. 

C.  The  moderately  expressed  gene,  KIAA1932,  also  reveals  distinct  accumulations  across 
the  gene.  The  red  box  highlights  alternatively  spliced  exons.  At  the  3'  end  of  the  gene, 
some  PolIIa  signal  is  observed,  probably  indicative  of  the  expression  of  a  small  gene 
antisense  to  the  KIAA1932  gene.  D.  Another  commonly  observed  pattern  is  exemplified  by 
the  EHD1  gene.  Both  polymerase  antibodies  recognize  the  first  exon  but  no  other 
significant  signal  is  observed  across  the  gene  until  the  3'  end  of  the  gene. 

Figure  3.  Different  RNA  Polymerase  states  show  distinct  exon  biases.  Pie  charts 
representing  the  percentage  of  exons  in  each  category  at  RNA  Polymerase  enrichment 
locations.  These  include  exons  from  enrichment  locations  that  include  more  than  one  exon. 
Hypophosphorylated  polymerase  is  strongly  biased  towards  transcription  initiation  locations. 
Most  of  the  internal  exons  are  second  exons  overlapping  with  first  exons.  The 
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phosphorylation  independent  antibody  recognizes  RNA  Polymerase  II  at  both  transcription 
initiation  and  termination  locations  with  a  slight  bias  towards  termination  locations. 

Figure  4.  Low  p-value  PolII  and  PolIIa  enrichments  are  biased  towards  higher  mRNA 
levels.  The  plot  depicts  the  observed  intensity  from  Affymetrix  U133  Plus  2  chips 
compared  to  different  p-values  of  PolII  (white)  and  PolIIa  (gray).  Some  genes  with  no 
significant  PolII  enrichment  show  high  levels  of  observed  intensity. 

Figure  5.  RNA  Polymerase  II  enrichment  is  not  always  within  annotated  gene  boundaries 
as  seen  in  views  from  the  UCSC  genome  browser  genome  version  HG16.  PolIIa  is  in  black 
and  PolII  is  in  blue  with  4  rows  for  each  representing  the  data  at  different  p-values:  p<10‘5, 
p<10'4,  p<10  \  and  p<10'2  from  top  to  bottom.  RNA  signal  in  red.  Panels  A  and  B  show 
PolII  extending  beyond  the  3'  of  the  annotated  gene.  Panels  C  and  D  show  RNA 
Polymerase  II  signal  in  putative  intergenic  regions  with  observed  RNA  signal  also  observed 
in  the  vicinity.  Panel  D  covers  chrl  1:285,000-290,000.  These  regions  are  conserved  and 
are  also  near  predicted  genscan  exons.  These  novel  sites  not  in  the  gene  regions  were 
confirmed  by  PCR. 
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Table  1.  Summary  of  RNA  Polymerase  II  locations.  The  order  indicates  the  flowchart  of 
filtering  through  the  different  databases.  Enrichment  sites  were  first  compared  to  the  Refseq 
database.  Sites  that  are  not  near  exons  were  then  divided  into  two  categories:  1)  Locations 
that  are  in  Refseq  introns.  2)  Locations  that  are  not  in  a  Refseq  gene  are  then  compared  to 
knownGene  and  predicted  gene  databases.  For  both  RNA  Polymerase  II  phosphorylation 
states,  the  large  majority  of  sites  are  near  an  exon. 


Pol  lla 

Pol  II 

Total  Sites 

102 

550 

Refseq  total  exons 

70 

289 

Refseq  first  exons 

63 

75 

Refseq  terminal  exons 

2 

91 

Refseq  internal  exons 

5 

123 

Refseq  introns 

4 

120 

knownGene  exon 

0 

5 

genscan  exon 

1 

23 

geneid  or  sgpGene 

0 

3 

Active  gene  introns 

2 

57 

Inactive  introns 

1 

32 

No  Refseq  overlap 

28 

141 

knownGene  total  exons 

5 

38 

knownGene  first  exon 

5 

13 

knownGene  terminal  exon 

0 

4 

knownGene  internal  exon 

0 

21 

No  Refseq  or  knownGene 

23 

90 

genscan  exons 

7 

43 

geneid  or  sgpGene 

2 

6 
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Flow  cytometry,  in  combination  with  advances  in  bead  coding 
technologies,  is  maturing  as  a  powerful  high-throughput  approach 
for  analyzing  molecular  interactions.  Applications  of  this 
technology  include  antibody  assays  and  single  nucleotide 
polymorphism  mapping.  This  review  describes  the  recent 
development  of  a  microbead  flow  cytometric  approach  to  analyze 
RNA-protein  interactions  and  discusses  emerging  bead  coding 
strategies  that  together  will  allow  genome-wide  identification  of 
RNA-protein  complexes.  The  microbead  flow  cytometric  approach 
is  flexible  and  provides  new  opportunities  for  functional  genomic 
studies  and  small-molecule  screening. 

Keywords  Flow  cytometry,  functional  genomics, 
microbeads,  microspheres,  RNA-protein  interactions 

Introduction 

The  determination  of  RNA-protein  regulatory  networks  is 
critical  for  understanding  biological  pathways.  The  role  of 
RNA  and  RNA-protein  interactions  in  regulating  gene 
expression  is  becoming  more  appreciated  with  each  new 
discovery.  RNA-protein  interactions  are  the  backbone  of 
many  post-transcriptional  processes,  including  mRNA 
stability,  splicing,  translation  and  localization.  Determining 
which  RNAs  and  proteins  interact  remains  a  challenging 
goal  in  the  post-genomics  era.  Many  human  diseases  such  as 
fragile  X  [1]  and  HIV  [2]  are  controlled  by  proteins 
interacting  with  RNAs.  Proteins  also  form  complexes  with 
both  large  and  small  (eg,  7SK)  non-coding  RNAs  [3,4]  and 
microRNAs  [5]  to  regulate  gene  expression.  Understanding 
how  RNA-protein  interactions  shape  gene  expression 
pathways  on  genome-wide  levels  remains  unclear. 

This  review  highlights  recent  advances  in  technologies  to 
study  RNA-protein  interactions  using  genomic  and  high- 
throughput  methods.  In  particular,  we  will  focus  on  the  use 
of  microbeads  to  explore  RNA-protein  complexes  by  flow 
cytometry.  These  methods  could  evolve  into  diagnostic 
assays  and  high-throughput  screens  of  pharmacological 
agents  targeted  to  RNA-protein  interactions.  In  this  review, 
the  assay  will  be  introduced,  and  aspects  of  microbead 


technology  important  for  the  assay,  such  as  microbead 
multiplexing  and  surface  chemistry,  will  be  discussed. 

RNA-protein  screening  approaches 

Many  assays  have  been  developed  to  examine  nucleic  acid 
protein  interactions  in  vitro,  including  gel  mobility  shift, 
footprinting  and  filter  binding.  Hazbun  and  Fields 
performed  a  large-scale  electrophoretic  gel  mobility  shift 
assay  (EMSA)  to  monitor  DNA  binding  proteins  from  pools 
of  glutathione-S-transferase  (GST)  yeast  protein  libraries  [6], 
However,  similar  to  the  other  biochemical  strategies,  EMSA 
requires  many  manipulations  making  genome-wide 
screening  labor  intensive  and  complicated.  In  addition,  these 
approaches  require  labeling  of  the  RNA  to  monitor  binding, 
making  it  difficult  to  pick  a  particular  protein  and  determine 
the  specifically  binding  RNAs. 

A  number  of  genetic  methods  have  been  developed  for  the 
analysis  of  RNA-protein  interactions.  One  system  that  can 
screen  for  either  RNA  binding  proteins  or  for  RNA  sequences 
is  the  three-hybrid  assay  [7].  However,  long  RNA  sequences 
cannot  be  analyzed  and  certain  sequences  cause  transcription 
termination  [7],  A  second  genetic  strategy  is  the  Translational 
Repression  Assay  Procedure  (TRAP)  in  yeast.  This  strategy 
works  well  with  hairpin-containing  RNA  binding  sites  but 
has  yet  to  be  tested  with  a  variety  of  RNA  structures  [8],  More 
recently,  phage  display  methods  have  been  developed  with  a 
model  system  to  clone  candidate  proteins  binding  to  a  specific 
RNA  sequence  [9].  Genetic  methods  in  mammalian  cell  lines, 
such  as  the  Tat-fusion  transcriptional  activation  system  [10] 
and  frameshifting  assay  [11],  offer  the  ability  to  screen  in  the 
presence  of  potential  binding  partners.  One  drawback  of  these 
methods  is  that  the  complexes  are  forced  to  form  in  particular 
cellular  compartments  that  may  not  be  the  native  location.  In 
addition,  they  often  depend  upon  the  generation  of  cDNA 
libraries  that  may  be  biased  towards  the  most  abundant 
messages  and  would  also  miss  non-coding  RNAs  such  as 
microRNAs. 

Recently,  DNA  chips  have  been  used  to  identify  RNAs 
bound  to  proteins  [1,12,13»,14»»],  This  approach  is 
promising  for  the  investigation  of  RNA-protein  interactions 
on  a  genome-wide  scale.  Typically,  RNA-protein  complexes 
are  immunoprecipitated  and  the  RNA  is  isolated  and 
analyzed  on  DNA  chips.  Alternatively,  protein  can  be 
prepared  on  beads  and  cell  extract  can  be  bound  to  the  bead 
[15].  However,  these  approaches  rely  on  the  ability  to 
preserve  stable  interactions  during  immunoprecipitation; 
many  potentially  weak  interactions  may  be  lost.  In  addition, 
RNA  binding  proteins  typically  have  high  non-specific 
binding  constants  leading  to  the  isolation  of  a  mixture  of 
specific  and  non-specific  'bound'  species,  complicating  the 
analysis.  Other  experiments  such  as  systematic  evolution  of 
ligands  by  exponential  amplification  (SELEX)  may  be  necessary 
to  help  determine  the  specifically  binding  RNAs  [16]. 

The  microbead  assay 

A  new  approach  to  RNA  biochemistry  uses  flow  cytometry 
and  oligonucleotides  attached  to  microbeads  (Figure  1) 
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[17«»],  A  fluorescently  labeled  protein  is  bound  to 
fluorescently  labeled  RNA  before  being  challenged  with 
oligonucleotides  attached  to  microbeads.  After 
hybridization,  under  conditions  that  do  not  disrupt  the 
RNA-protein  interaction,  the  microbeads  are  sorted  and 
analyzed  by  flow  cytometry.  The  presence  or  absence  of 
RNA  and  protein  signals  provides  binding  interaction 
information.  RNA-protein  interactions  can  be  specifically 
identified  from  complex  mixtures  while  simultaneously 
characterizing  binding  properties,  such  as  the  dissociation 
constant  (K,).  In  addition,  by  probing  with  a  high-density 
oligonucleotide  library  against  RNAs  of  interest,  the  binding 
site  could  be  determined. 


Figure  1.  Schematic  of  the  microbead  assay. 


A  fluorescently  labeled  RNA  binding  protein  (RBP)-RNA  complex  is 
formed  and  subsequently  challenged  with  oligonucleotide  beads. 
After  reaching  equilibrium,  RNA  and  protein  fluorescence  on  each 
microbead  is  determined  by  flow  cytometry.  The  experiment  can  be 
performed  with  or  without  fluorescently  labeled  RNA.  Three 
scenarios  are  possible:  (i)  RNA  and  protein  fluorescence  signal  is 
observed  indicating  the  bead  is  coupled  to  an  oligonucleotide 
complementary  to  a  RNA  molecule  that  is  binding  the  RBP;  (ii)  no 
protein  fluorescence  signal  is  observed  but  the  oligonucleotide  is 
hybridizing  to  the  RNA.  With  labeled  RNA,  the  RNA-oligonucleotide 
hybridization  is  detected.  These  oligonucleotides  may  be 
complementary  to  the  RBP  binding  site  and  compete  for  RBP 
binding;  and  (iii)  beads  with  neither  protein  nor  RNA  fluorescence 
suggesting  that  these  oligonucleotides  do  not  hybridize  to  the  RNA. 
These  sequences  may  be  non-complementary  to  the  RNA. 


The  RNA-protein  microbead  assay  was  developed  with  the 
U1  snRNP  model  system.  U1 -green  fluorescent  protein 
(GFP)  was  purified  and  bound  to  a  150mer  RNA;  binding  is 
indicated  by  GFP  fluorescence  on  the  bead  population, 
(representative  flow  cytometry  data  are  shown  in  Figure  2). 
RNA  mutations,  oligonucleotide  mismatches  and 
dissociation  constants  were  measured  to  demonstrate  the 
specificity  of  the  assay.  Single  mismatch  discrimination  of 
short  oligonucleotides  was  possible  when  the  signal  was 
monitored  through  the  protein  binding.  Importantly,  RNAs 
could  be  specifically  detected  in  total  RNA  isolated  from 
cells.  The  sensitivity  is  in  the  range  of  other  common  flow 
cytometry  assays,  since  picomolar  RNA  concentrations 


could  be  detected.  In  this  format,  the  assay  is  accessible  to 
most  molecular  biology  laboratories  as  it  uses  common 
reagents,  and  many  facilities  have  access  to  flow  cytometers. 

Figure  2.  Flow  cytometry  data  showing  GFP  fluorescence  on 
microbeads. 


U1-GFP  is  bound  to  a  150mer  RNA  which  includes  a  stem-loop  binding 
site.  Histograms  show  the  number  of  beads  at  different  GFP 
fluorescence  intensities.  In  the  presence  of  RNA  and  U1-GFP,  the 
fluorescence  intensity  of  the  bead  population  increases  and  a  more 
homogeneous  bead  population  is  observed,  as  shown  by  comparison  of 
panels  2,  3  and  4.  The  different  expected  outcomes,  as  outlined  in 
Figure  1,  are  shown:  (i)  oligonucleotide  I  is  complementary  to  the  RNA 
distant  from  the  binding  site;  (ii)  oligonucleotide  II  is  complementary 
to  the  loop  of  the  stem-loop  and  competes  with  U1-GFP  binding; 
and  (iii)  oligonucleotide  III  is  a  non-complementary  oligonucleotide. 
Oligonucleotides  II  and  III  show  non-specific  binding  similar  to  the 
background,  as  shown  by  comparison  of  panels  2, 5  and  6. 


The  microbead  assay  is  an  equilibrium  binding  assay  that 
offers  some  distinct  advantages  for  the  biochemical 
characterization  of  RNA-protein  complexes.  Firstly,  protein 
binding  to  large  RNAs  can  be  examined.  In  fact,  larger  RNAs 
offer  more  hybridization  targets  for  the  antisense 
oligonucleotide  probes.  Using  oligonucleotides  targeting 
different  regions  of  the  RNA,  binding  can  be  monitored  across 
the  whole  RNA  molecule.  Also,  binding  reactions  could  be 
performed  in  the  presence  of  potential  co-operative  binding 
partners  by  using  cell  lysates  or  partially  purified  cell 
fractions.  Since  binding  can  be  monitored  at  different 
locations  across  the  RNA  molecule,  similar  to  footprinting 
assays,  specific  and  non-specific  sites  may  be  differentiated. 
This  may  allow  non-specific  binding  sites  to  be  differentiated 
in  genomic  screens  as  the  assay  can  monitor  interactions  from 
the  picomolar  to  nearly  micromolar  dissociation  constant 
range.  Thus,  weak  interactions  can  be  monitored  and 
potentially  discriminated  from  non-specific  interactions. 

The  assay  requires  fluorescent  labeling  of  the  RNA  and/or 
protein.  A  number  of  strategies  have  been,  and  are 
continuing  to  be  developed  to  label  proteins  with  minimal 
disruption  to  their  structure  and  function.  The  first 
generation  of  the  RNA-protein  bead  assay  uses  a  GFP 
fusion.  Other  strategies  include  the  use  of  antibodies,  where 
a  fluorescently  labeled  antibody  against  a  protein  is  used  to 
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monitor  binding  [18].  Screening  in  yeast  has  found  that 
many  proteins  are  functional  when  either  their  N-  or  C- 
terminal  are  tagged  [19].  Advances  in  in  vitro  translation 
may  also  allow  site-specific  labeling  at  the  beginning  or  end 
of  the  protein  as  seen  in  both  Promega  and  Roche  systems. 
These  strategies  will  increase  the  probability  of  obtaining 
functionally  active,  fluorescently  labeled  protein. 

Flow  cytometry 

Flow  cytometry  is  a  powerful,  sensitive  and  quantitative 
technology  used  to  measure  molecular  interactions.  Flow 
cytometry  has  been  successfully  applied  to  examine  various 
protein-protein,  protein-DNA  and  DNA-DNA  interactions 
[20,21].  As  it  is  fluorescence  based,  it  can  also  be  adapted  to 
monitor  real-time  kinetics  and  rapid  quench  studies.  Very 
high  sensitivity  can  be  obtained  with  10'  to  10'  particles/ml 
and  target  concentrations  in  the  10  to  100  pM  range,  well 
below  the  K,  of  most  RNA-protein  interactions.  Since  flow 
cytometry  can  focus  on  just  the  signal  on  the  microbeads 
and  not  the  unbound  molecules  in  solution,  typically,  no 
washing  is  required,  saving  significant  effort.  Also,  recent 
advances  in  coding  microbeads  are  bringing  the  power  of 
multiplexing  dozens  of  samples  simultaneously  to  these 
assays.  With  the  ability  to  use  automatic  sample  loaders 
running  at  two  to  three  samples/min,  high-throughput  plate 
reading  is  now  feasible.  Recent  reviews  highlight  the  latest 
technical  advances  in  flow  cytometry,  allowing  for  high 
throughput  and  sensitivity  [20-22], 

Microbeads  versus  microarrays  and 
hybridization 

Binding  to  microbeads  instead  of  microarrays  offers  a 
number  of  potentially  significant  advantages.  Microbeads 
have  proven  to  be  useful  for  sensitive  and  rapid 
bioanalytical  assays.  Companies  such  as  Luminex  Corp, 
Lynx  Therapeutics  and  Quantum  Dot  Corp  have  taken 
advantage  of  these  properties  to  devise  high-throughput 
approaches  to  immunoassays  [23],  sequencing  [24,25]  and 
.single  nucleotide  polymorphism  mapping  [26,27«]. 
Importantly,  microbead  assays  are  typically  cost-effective, 
fast  and  require  minimal  sample  quantities. 

An  important  advantage  of  microbead  for  the  RNA-protein 
binding  assay  is  the  ability  to  perform  binding  on  a  surface 
that  more  closely  resembles  solution  conditions. 
Hybridization  on  large  planar  surfaces  is  limited  by  mass 
transport.  On  the  other  hand,  microbeads  offer  better 
diffusion  characteristics,  leading  to  significantly  improved 
hybridization  kinetics  and  thermodynamics  [28»,29»], 

The  basic  approach  of  the  microbeads  assay  is  also  applicable 
to  microarrays.  However,  non-specific  hybridization  at 
physiological  conditions  is  a  requirement  of  the  assay.  Due  to 
the  demanding  hybridization  requirements  and  the  relative 
ease  in  synthesizing  oligonucleotides  with  long  linkers  to 
readily  available  microbeads,  the  microbead  approach  offers  a 
simple  alternative  to  microarrays. 

Encoding  strategies 
Bead  libraries 

Most  high-throughput  bead-based  libraries  use  the  optical 
properties  of  the  support  as  the  library  code.  The  exception 


to  this  is  the  approach  from  Lynx  Therapeutics,  who  utilize 
non-encoded  support  beads,  and  a  series  of  molecular 
markers  and  identifiers  [24],  Optical  encoding  of  supports 
falls  into  two  broad  categories.  The  first  (Luminex  Corp, 
Quantum-dot  Corp,  Illumina  and  Nanoplex)  is  based  on 
separately  coding  each  bead  and  separately  synthesizing  the 
target  DNA  sequence  (or  other  analyte  such  as  RNA  or 
peptide),  then  attaching  each  target  to  a  coded  bead.  The 
alternative  technique  is  to  directly  synthesize  the  target 
molecule  on  a  coded  bead  in  a  combinatorial  manner  and 
track  every  synthetic  step  each  individual  bead  experiences 

[30—]. 

Separate  encoding 

Methods  that  use  the  separate  encoding  strategy  employ  a 
similar  strategy  to  encode  the  beads.  In  each  case, 
fluorochromes,  fluorescent  dyes  (Luminex  [31]  and  Illumina 
[32])  or  fluorescent  nanocrystals  (Quantum-dot  Corp 
[27»»,33])  are  incorporated  into  polystyrene  beads  by 
swelling  the  polystyrene  in  a  solvent  and  absorbing  dyes  or 
nanocrystals  into  the  particles.  The  bead  is  then  placed  in  a 
different  solvent  to  shrink  the  bead,  trapping  the 
fluorochrome  in  the  bead.  The  code  is  formed  by  varying  the 
concentration  and  the  combination  of  fluorochromes  present 
in  each  bead.  The  code  can  be  read  either  by  a  flow 
cytometer  (Luminex  Corp,  Quantum-dot  Corp)  or  by  optic 
fiber  array  (Illumina).  DNA  sequences  are  synthesized 
remotely  (either  separately  in  an  automated  DNA 
synthesizer  or  in  vivo )  and  attached  to  the  beads  using 
standard  ethylenediamine  carbodiimide  (EDC)  coupling 
chemistry  [34].  The  separate  encoding  techniques  are  useful 
for  small  libraries  since  it  is  easy  to  separately  synthesize 
hundreds  of  different  beads  and  hundreds  of  target 
molecules  (Table  1).  However,  there  are  limitations  for 
larger  libraries  [35].  To  synthesize  a  library  of  100,000 
compounds  requires  100,000  separate  coded  beads  and 
100,000  separately  synthesized  DNA  sequences  combined  in 
100,000  coupling  reactions  (Figure  3).  Automation  of  this 
process  is  possible;  however,  the  size  of  the  library  is  still 
limited  by  the  number  of  coded  beads  that  can  be  formed. 

Nanoplex  uses  metallic  rods  (instead  of  spherical  particles) 
with  bands  of  material  with  different  refractive  index  to 
form  coding  system,  which  is  similar  to  traditional 
barcodes  but  on  a  microscopic  scale  [36].  The  difference  in 
refractive  index  is  achieved  by  incorporating  different 
metals  into  the  rods  as  they  are  synthesized.  Similar  to  the 
other  separate  encoding  strategies,  library  size  is  limited 
by  the  number  of  separate  reaction  vessels  required  to 
synthesize  the  coded  support  and  the  analyte.  However, 
unlike  the  fluorescent  coding  approach,  the  barcode  can  be 
incorporated  over  a  large  number  of  steps,  so  the  coding 
system  can  code  for  many  more  sequences  than  it  would 
be  possible  to  synthesize  in  the  library.  At  this  time,  there 
is  no  automated,  high-throughput  method  of  reading  these 
barcodes. 

Combinatorial  encoding/synthesizing 

In  the  combinatorial  method,  a  set  of  optically  diverse,  but 
distinguishable  particles  are  used  as  the  support  for 
synthesizing  the  target  DNA  (or  other  target  molecules)  [37], 
The  optically  diverse  set  of  particles  are  synthesized  using 
a  combinatorial  process  where  beads  are  split  into  a  number 
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Table  1.  Comparison  of  microbead  coding  strategies. 


Encoding  method 


DNA  microarray 


Decoding  method 


Positional  encoding.  Probes  are  Via  position  in  array, 
immobilized  in  spatially  resolved 
sites  on  a  two-dimensional 
support. 


<  106  probes  [42]. 


Non-permanently  stained 
polymer  beads  with  up  to  four 
fluorochromes. 


Layered  metallic  strips  on  rod 
shaped  particles. 


The  unique  optical  signature  of 
each  multi-fluorescent  support 
bead  is  tracked  by  a  flow 
cytometer  during  the 
combinatorial  synthesis  of  the 
probe. 


Flow  cytometry  [31,33],  optical 
fiber  arrays  [30««],  digital 
imaging. 


Microscopic  imaging  (not 
automated). 


The  optical  signature  is  analyzed 
by  flow  cytometry  and  the 
reaction  history  of  the  bead  is 
determined  by  recalling  data 
stored  by  the  flow  cytometer 
software  during  probe  synthesis. 


100  to  270,000 
probes. 


>100  probes  [36] 
(can  potentially  code 
1013but  library  size 
is  limited  by 
decoding  rate  and 
library  synthesis). 


>108  probes  [30**]. 


Figure  3.  Comparison  of  encoding  techniques. 
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(A)  Separate  encoding  strategy  where  beads  are  individually  coded  in  separate  reaction  vessels  and  the  oligonucleotides  are  individually 
synthesized  remotely.  The  oligonucleotide  is  coupled  to  the  bead  using  standard  EDC  chemistry. 

(B)  Combinatorial  encoding  strategy  where  silica  particles  are  coded  using  a  split  and  mix  process  with  varying  concentrations  of  dyes.  Using 
a  customized  flow  cytometer,  the  particles  are  sorted  into  four  reaction  vessels  (one  for  each  base)  according  to  predetermined  parameters. 
The  process  is  repeated  until  oligonucleotides  of  the  required  length  are  synthesized. 
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of  reaction  vessels  and  varying  concentrations  of 
fluorophores  such  as  organic  fluorescent  dyes  or 
nanoparticles,  are  covalently  incorporated  into  the  beads. 
The  beads  are  then  mixed  together  and  the  process  is 
repeated  for  each  subsequent  dye;  thus,  it  is  not  necessary  to 
synthesize  each  coded  bead  individually  [30»»].  Using  this 
method  with  six  fluorophores  and  eight  levels  of  intensity 
for  each  dye,  a  library  of  over  250,000  signatures  can  be 
constructed.  Still  using  only  six  fluorophores,  but  with  16 
levels  of  intensity,  a  library  of  over  16  million  sequences  can 
be  generated  (Table  1). 

Using  a  flow  cytometer  and  custom  designed  electronics, 
beads  can  be  analyzed  and  sorted  according  to  the  particular 
optical  signatures  [30«].  Each  bead  has  a  predetermined 
sequence  that  is  uploaded  to  the  modified  flow  cytometer 
and  sort  decisions  are  made  according  to  the  sequences  that 
are  required  for  the  particular  library.  The  flow  cytometer 
can  sort  into  four  directions,  with  each  direction 
corresponding  to  a  different  nucleoside.  After  each  'sort'  the 
nucleosides  are  coupled  to  the  corresponding  beads  and 
once  coupling  is  complete,  the  beads  are  mixed  together  and 
the  process  is  repeated  until  the  oligonucleotide  sequences 
of  the  required  length  are  synthesized.  At  the  end  of  the 
process,  beads  with  a  known  optical  signature  are 
synthesized  with  each  unique  signature  corresponding  to  a 
different  oligonucleotide  sequence  (Figure  3). 

Synthesizing  libraries  in  this  way  requires  beads  that  can 
withstand  the  relatively  harsh  conditions  of  DNA  synthesis. 
Polystyrene  beads  are  typically  not  suitable  for  this  process 
because  they  swell  and  leach  dye  during  the  synthesis 
procedure,  thus  catastrophically  altering  the  optical 
signature.  Therefore,  specially  synthesized  silica  particles 
are  required  [30«], 

Surface  chemistry 

The  surface  chemistry  of  the  beads  plays  an  important  role 
in  the  assay.  Non-specific  binding  of  proteins  to  the  beads  is 
a  larger  problem  than  the  non-specific  binding  of 
oligonucleotides  to  the  beads,  as  generally,  most  surfaces 
with  a  large  negative  charge  (eg,  silica  and  polystyrene 
surfaces)  have  relatively  low  non-specific  binding  of 
oligonucleotides  (due  to  repulsion  of  the  negatively  charged 
phosphate  backbone).  However,  as  proteins  have  positively 
and  negatively  charged  regions,  it  is  necessary  to  have  a 
surface  which  has  little  or  no  surface  charge  to  minimize 
non-specific  electrostatic  binding  and  molecule  adsorption 
[38«] .  Coating  the  surfaces  with  hydrophobic  chains  (such  as 
alkyl  chains)  is  also  not  ideal,  since  many  proteins  have 
hydrophobic  regions  that  will  also  non-specifically  bind  to 
the  beads  [39],  One  solution  is  to  add  a  large  excess  of  a 
protein  (eg,  inexpensive  and  abundant  proteins  such  as 
bovine  serum  albumin)  that  non-specifically  bind  to  the 
surface  of  the  beads,  limiting  the  non-specific  binding  of  the 
fluorescently  labeled  protein.  However,  there  is  a  limit  to  the 
effectiveness  of  this  procedure  and  it  is  desirable  to  have  a 
'biologically  silent'  surface  that  limits  the  non-specific 
binding  of  the  proteins. 

Much  of  the  surface  chemistry  developed  for  protein-chips 
can  be  applied  to  bead  surfaces.  Polyethylene  glycol  and 


oligoethylene  glycol  surfaces  have  been  used  to  minimize 
the  non-specific  binding  of  proteins  [39]  to  silica  substrates. 

Surface  density  of  the  probes  also  plays  an  important  role  in 
the  assay.  Clearly,  the  higher  the  number  of  probes  on  the 
beads,  the  higher  the  resultant  signal;  however,  overloading 
the  surface  introduces  problems.  It  is  possible  to  load  in 
excess  of  100  million  target  sequences  on  a  single  bead,  but 
at  this  very  high  surface  density,  steric  hindrance  can  affect 
the  hybridization  of  target  DNA  to  the  beads.  In  addition, 
false  hybridization  events  may  occur,  where  one  target  DNA 
strand  hybridizes  to  multiple  probe  strands  on  the  bead  [40], 
Similar  findings  have  been  observed  on  arrays  [41], 

High-throughput  screening  and  genomics 

This  review  describes  the  recent  development  of  a  versatile 
flow  cytometry  approach  to  examine  RNA-protein 
interactions.  The  emerging  bead-coding  and  surface 
chemistry  technologies,  in  combination  with  novel  assays 
such  as  the  microbead  RNA-protein  assay  will  lead  to  new 
small  molecule  and  genomic  screens.  Due  to  the  versatility 
and  flexibility  of  flow  cytometry  and  the  RNA-protein  assay, 
many  variations  are  possible,  including  defining  the  binding 
spectrum  of  a  particular  RNA-binding  protein,  screening  a 
protein  library  for  binding  to  a  specific  RNA,  or  discovering 
small  molecules  that  inhibit  an  RNA-protein  interaction. 
With  the  increasing  understanding  of  the  importance  of 
RNA-protein  interactions  in  human  disease  and 
development,  the  contribution  of  these  promising 
technologies  is  expected  to  be  significant. 
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A  Microbead-based  System  for  Identifying  and 
Characterizing  RNA-Protein  Interactions  by 
Flow  Cytometry* 

Alexander  S.  Brodsky$§  and  Pamela  A.  Silver 


We  present  a  high  throughput,  versatile  approach  to  iden¬ 
tify  RNA-protein  interactions  and  to  determine  nucleo¬ 
tides  important  for  specific  protein  binding.  In  this  ap¬ 
proach,  oligonucleotides  are  coupled  to  microbeads  and 
hybridized  to  RNA-protein  complexes.  The  presence  or 
absence  of  RNA  and/or  protein  fluorescence  indicates  the 
formation  of  an  oligo-RNA-protein  complex  on  each  bead. 
The  observed  fluorescence  is  specific  for  both  the  hybrid¬ 
ization  and  the  RNA-protein  interaction.  We  find  that  the 
method  can  discriminate  noncomplementary  and  mis¬ 
match  sequences.  The  observed  fluorescence  reflects  the 
affinity  and  specificity  of  the  RNA-protein  interaction.  In 
addition,  the  fluorescence  patterns  footprint  the  protein 
recognition  site  to  determine  nucleotides  important  for 
protein  binding.  The  system  was  developed  with  the  hu¬ 
man  protein  U1A  binding  to  RNAs  derived  from  U1  snRNA 
but  can  also  detect  RNA-protein  interactions  in  total  RNA 
backgrounds.  We  propose  that  this  strategy,  in  combina¬ 
tion  with  emerging  coded  bead  systems,  can  identify 
RNAs  and  RNA  sequences  important  for  interacting  with 
RNA-binding  proteins  on  genomic  scales.  Molecular  & 
Cellular  Proteomics  1:922-929,  2002. 


RNA-protein  interactions  are  a  central  component  of  post- 
transcriptional  regulation  at  multiple  levels  including  RNA 
processing,  transport,  and  translation.  The  sequenced  human 
genome  reveals  hundreds  of  potential  RNA-binding  proteins 
(1).  A  critical  step  toward  understanding  the  function  of  RNA- 
binding  proteins  is  to  identify  and  determine  how  they  interact 
with  their  target  RNAs. 

Several  strategies  have  been  developed  to  identify  RNA- 
protein  interactions.  Genetic  approaches  include  three-hybrid 
screens  (2),  phage  display  (3),  and  TRAP  (translational  repres¬ 
sion  assay  procedure)  (4)  to  identify  proteins  that  bind  a 
specific  RNA.  However,  these  strategies  are  generally  not 
applicable  to  larger  RNAs  and  not  suitable  for  determining 
binding  constants.  SELEX  (systematic  evolution  of  ligands  by 
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exponential  enrichment)  can  identify  high  affinity  RNA  se¬ 
quences  that  may  or  may  not  reflect  the  biologically  relevant 
recognition  site  (5). 

Recently  immunoprecipitation  has  been  combined  with  mi¬ 
croarray  analysis  to  identify  candidate  RNAs  bound  to  pro¬ 
teins  (6,  7).  This  approach  is  very  promising  for  inspection  of 
RNA-protein  interactions  on  a  genome-wide  scale.  However, 
it  relies  on  the  ability  to  preserve  stable  interactions  during 
immunoprecipitation;  many  potentially  weak  interactions  may 
be  lost.  In  addition,  extensive  motif  searching  together  with 
additional  experimentation  may  be  required  to  validate  the 
biological  significance  of  any  interactors  (8). 

Recent  advances  in  bead  coding  technologies  to  create 
high  complexity  platforms  are  leading  to  the  development  of 
new  approaches  for  high  throughput  screening  studies  that 
could  be  amenable  to  the  study  of  RNA-protein  interactions 
(9-12).  In  principle,  nucleic  acid  hybridization  on  microbeads 
offers  a  number  of  advantages  over  DNA  chips  including 
shorter  hybridization  times  and  better  control  of  binding  con¬ 
ditions  (13).  Therefore,  we  have  developed  a  new  equilibrium 
binding  method  on  microbeads  for  elucidating  the  recognition 
site  of  an  RNA-binding  protein  on  its  cognate  RNA.  The  ap¬ 
proach  uses  oligonucleotide-coupled  microbeads  and  fluo- 
rescently  labeled  RNAs  and  proteins  to  monitor  RNA-protein 
binding  by  flow  cytometry.  To  develop  the  system  for  screen¬ 
ing  RNA-protein  interactions,  we  demonstrate  how  this  ap¬ 
proach  can  be  used  to  identify  and  characterize  the  interac¬ 
tion  between  the  spliceosomal  protein  U1A  and  a  hairpin 
derived  from  U1  snRNA  as  well  as  detect  specific  RNAs  from 
total  RNA  populations. 

EXPERIMENTAL  PROCEDURES 

Plasmids— The  U1A  test  transcript  was  constructed  by  annealing 
overlapping  oligos  and  ligating  the  annealed  product  into  pDP19 
(Ambion)  to  create  plasmid  pPS2702.  The  oligo  sequences  are:  AAT- 
TCTTTATCTTCAAAGTTGTCTGTCCAAGATTTGGACTTGTCCGGAG- 
TGCAATGGACG,  AAG G ACAAGCGT GT CTT CAT C AG AGTT G ACTT C- 
ACTCGAG,  GAC  AAGT CC  AAAT  CTT G  G  AC  AG  AC  AACTTT G  AAG  AT A- 
AAG,  and  GATCCTCGAGT  G  AAGT  CAACT  CT  GAT G  AAG  AC  ACGCTT  - 
GTCCTTCGTCCATTGCACTCCG. 

UlA-green  fluorescent  protein  (GFP)1  was  PCR-amplified  from 
pPS2035  and  ligated  into  prSETB  (Invitrogen)  to  create  pPS2699.  The 
96A-h>G  U1A  point  mutant  was  constructed  by  using  Stratagene’s 
QuikChange  system  to  create  pPS2703.  77C-»G  was  constructed  by 


1  The  abbreviation  used  is:  GFP,  green  fluorescent  protein. 
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ligating  annealed  oligonucleotides  into  pDPI  9  as  described  above.  All 
constructs  were  verified  by  sequencing. 

Transcription  and  RNA  Preparations— PPS2702  and  pPS2703  were 
linearized  with  EcoRI  and  subsequently  transcribed  with  Ambion’s  T3 
polymerase  Maxiscript  kit.  Labeling  with  32P  verified  a  product  of  the 
expected  size,  and  subsequent  transcription  reactions  were  purified 
by  G-50  spin  columns  or  phenol  extractions  followed  by  multiple 
ethanol  precipitations.  Texas  Red-5-UTP  (Molecular  Probes)  was  in¬ 
corporated  during  transcription  and  purified  with  G-50  spin  columns. 
Total  RNA  from  HeLa  cells  was  prepared  by  the  TRIzol  method  with 
high  salt  precipitations  to  reduce  background  GFP  fluorescence. 
Yeast  RNA  was  isolated  by  a  hot  phenol  method.  RNA  concentrations 
were  determined  by  UV  spectrometry. 

U1A-GFP  Purification— Cells  were  grown  to  0.5  optical  density 
before  induction  with  1  mM  isopropyl-1 -thio-/3-D-galactopyranoside 
for  4  h.  Cells  were  resuspended  in  20  mM  HEPES,  10  mM  KCI,  0.1% 
IGEPAL  and  lysed  with  lysozyme  followed  by  sonication.  After  cen¬ 
trifugation,  lysate  was  applied  to  nickel  columns,  washed  extensively, 
and  eluted  with  imidazole.  Green  fractions  were  pooled  and  dialyzed 
into  10  mM  HEPES,  pH  7.6, 10  mM  KCI,  0.1  %  IGEPAL.  To  remove  the 
histidine  tag,  1 .25  units//xl  enterokinase  were  added  and  incubated 
for  >48  h  at  25  °C.  Enterokinase  was  removed  with  EKaway  resin 
(Invitrogen).  U1A-GFP  was  dialyzed  into  storage  buffer  (20  mM 
HEPES,  20  mM  KCI,  0.1%  IGEPAL,  10%  glycerol).  Concentrations 
were  determined  by  comparing  U1 A-GFP  to  bovine  serum  albumin  on 
Coomassie  gels  and  by  the  Bio-Rad  protein  assay.  Protein  stored  at 
-80  °C  bound  RNA  similarly  to  fresh  preparations  (data  not  shown). 

Bead  Preparation—  Before  coupling,  Dynal  2.8-/xm  magnetic 
streptavidin  beads  (M-280)  were  vortexed  and/or  sonicated  to  reduce 
aggregation.  Similar  to  Dynal's  recommended  protocols,  oligonucleo¬ 
tides  were  attached  to  beads  with  1  nmol  of  oligonucleotide/9  x  10e 
beads/30  jd.  Incubations  longer  than  5  h  were  required  to  reach 
maximum  oligonucleotide  density  (data  not  shown).  Similar  proce¬ 
dures  were  used  for  the  Spherotech  5.7-jim  magnetic  streptavidin 
beads.  Oligonucleotides  were  synthesized  with  a  12-carbon  spacer 
and  5'  biotin  from  two  different  sources:  Dana-Farber  Cancer  Institute 
Molecular  Biology  Core  Facilities  and  Integrated  DNA  Technologies, 
Inc.  Oligonucleotides  from  each  source  behaved  similarly. 

Bead  Binding  Assays— Binding  was  performed  in  20  mM  HEPES, 
pH  7.5,  300  mM  KCI,  0.1%  IGEPAL,  10  ng//xl  tRNA,  0.04  units/fit 
superase-IN  (Ambion),  and  20  ng//xl  bovine  serum  albumin  unless 
otherwise  indicated.  RNA  was  heated  to  95  °C  for  1  min  and  cooled 
on  ice  before  being  mixed  with  U1A-GFP  for  at  least  20  min  at  room 
temperature  before  addition  of  1  x  10s  oligonucleotide-coupled 
beads.  Binding  reactions  were  incubated  at  35  °C  for  at  least  14  h 
with  constant  rotation  unless  otherwise  indicated.  Shorter  incubations 
(<6  h)  gave  lower  fluorescence. 

Flow  Cytometry  and  Data  Analysis— A  BD  Biosciences  Vantage 
was  used  to  sort  beads  with  both  GFP  and  Texas  Red  signals.  A 
FACScan  was  used  to  monitor  GFP  alone.  Typically  5,000-10,000 
beads  were  counted,  and  the  peak  channel,  which  indicates  the 
maximum  height  of  the  bead  population,  is  used  to  estimate  the 
fluorescence  intensity.  To  determine  the  percentage  of  the  population 
shifted  to  higher  fluorescence,  cut-offs  were  set  relative  to  the  back¬ 
ground  fluorescence.  Beads  with  fluorescence  above  the  cut-off  are 
counted  in  the  shifted  population.  For  the  binding  curves,  pro  Fit 
(Quantum  Soft)  was  used  to  fit  the  fluorescence  intensities  to  a 
Langmuir  isotherm. 


RESULTS 

The  experimental  design  for  analysis  of  protein-RNA  inter¬ 
actions  uses  oligonucleotides  coupled  to  microbeads  to 
probe  RNA-protein  interactions  and  is  outlined  in  Fig.  1A.  To 
carry  out  the  analysis,  a  protein-RNA  complex  is  first  formed 
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Fig.  1 .  Experimental  design.  A,  schematic  of  the  experiment.  A 
U1 A-GFP-RNA  complex  is  formed  and  subsequently  challenged  with 
oligonucleotide  beads.  After  reaching  equilibrium,  RNA  and  protein 
fluorescence  on  each  microbead  are  determined  by  flow  cytometry. 
The  experiment  can  be  performed  with  or  without  fluorescently  la¬ 
beled  RNA.  Three  scenarios  are  possible.  /,  GFP  signal  is  observed 
indicating  the  bead  is  coupled  to  an  oligo  complementary  to  the  RNA 
target  but  distant  from  the  U1A  stem  loop  recognition  site.  II,  no  GFP 
signal  is  observed,  but  the  oligo  is  hybridizing  to  the  RNA.  With 
labeled  RNA,  the  RNA-oligo  hybridization  is  detected.  These  oligos 
may  be  complementary  to  the  U1 A  binding  site.  Ill,  beads  with  neither 
GFP  nor  RNA  fluorescence  are  observed,  suggesting  that  these  oli¬ 
gos  do  not  hybridize  to  the  RNA.  These  sequences  may  be  non¬ 
complementary  to  the  RNA.  B,  predicted  secondary  structure  of  U1 A 
RNA  constructed  for  these  studies  as  determined  by  mFOLD  (21). 
U1  A-GFP  binds  to  the  hairpin  derived  from  U1  snRNA  as  indicated. 
An  A  to  G  mutation  (96A-»G)  reduces  binding  1000-fold.  Region  1  is 
complementary  to  the  binding  site.  Oligonucleotides  complementary 
to  other  regions  of  the  RNA,  distant  from  the  binding  site,  are  also 
indicated.  These  sequences  are  predicted  to  hybridize  to  the  RNA 
while  U1  A-GFP  is  binding,  allowing  the  U1A  RNA-protein  interaction 
to  be  observed. 

and  then  incubated  with  beads  to  which  oligonucleotides 
complementary  to  the  target  RNA  have  been  coupled.  In  the 
experiments  described  here,  RNA  is  labeled  with  Texas  Red, 
and  the  protein  is  a  GFP  fusion.  After  reaching  equilibrium,  the 
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Table  I 

Oligonucleotides  used  in  this  study 
Mismatch  nucleotides  are  in  lowercase. 


Name 

Sequence 

1.20a 

Oligonucleotides  complementary  to  U1A  binding  site 
TTGTCCGGAGTGCAATGGAC 

1.17 

GTCCGGAGTGCAATGGA 

1.15 

GTCCGGAGTGCAATG 

2.20 

Oligonucleotides  complementary  to  regions  distant  from 
U1A  binding  site 

AAGGACAAGCGTGTCTTCAT 

2.17 

GGACAAGCGTGTCTTCA 

2.15 

GGACAAGCGTGTCTT 

3.17 

TCAGAGTTGACTTCACT 

3.15 

TCAGAGTTGACTTCA 

2C.20b 

Mismatch  oligonucleotides 

AAGGACAAcCGTGTCTTCAT 

2C.17 

GGACAAcCGTGTCTTCA 

2C.15 

GGACAAcCGTGTCTT 

3C.17 

TCAGAcTTGACTTCACT 

3C.15 

TCAGAcTTGACTTCACT 

4.17 

U1  snRNA  oligonucleotides 

CCCTGCCAGGTAAGTAT 

4G.17 

CCCTGCgAGGTAAGTAT 

aThe  first  number  indicates  the  sequence  that  is  being  targeted, 
while  the  second  number  indicates  the  oligonucleotide  length, 
e.g.  1 .20  is  an  oligonucleotide  complementary  to  region  1  with  20 
nucleotides. 

b  The  letter  indicates  a  point  mutation  in  the  oligonucleotide  dis¬ 
rupting  hybridization  to  the  RNA. 


beads  are  analyzed  in  a  flow  cytometer  for  protein  and  RNA 
fluorescence.  Beads  are  sorted  into  different  categories  as 
illustrated  in  Fig.  1  A.  1)  Beads  with  both  GFP  and  RNA  signals 
represent  the  RNA-protein  interaction.  The  oligonucleotides 
on  the  beads  hybridize  to  the  RNA  without  interfering  with 
protein  binding.  2)  RNA  signal  alone  indicates  RNA  hybridiza¬ 
tion  with  no  protein  binding.  The  oligonucleotides  on  these 
beads  may  be  competing  with  the  protein  to  bind  the  same 
RNA  sequences.  3)  Some  beads  will  have  no  detectable  flu¬ 
orescent  signal.  These  oligonucleotide-coupled  beads  con¬ 
tain  sequences  that  cannot  hybridize  to  the  RNA  including 
those  that  are  noncomplementary  or  contain  point  mutations. 
To  quantitate  the  data,  the  mean  fluorescence  and/or  the 
percentage  of  beads  in  the  different  categories  is  determined. 

Development  of  the  Bead  Binding  Assay— The  system  was 
developed  with  the  human  splicing  protein  U1 A  binding  to  the 
stem  loop  2  derived  from  U1  snRNA.  A  U1A-GFP  fusion 
protein  including  the  first  94  amino  acids  of  the  RNA  recog¬ 
nition  motif  was  expressed  with  an  N-terminal  His6  tag  and  a 
C-terminal  GFP.  The  histidine  tag  was  proteolytically  cleaved 
to  generate  functional  U1A-GFP.  A  145-nucleotide  RNA  en¬ 
coding  random  sequence  and  including  the  specific  U1 A  hair¬ 
pin  was  designed.  The  predicted  secondary  structure  is 
shown  in  Fig.  IB.  Gel  shift  mobility  experiments  confirmed 
that  U1A-GFP  binds  the  U1A  RNA  with  a  dissociation  con¬ 
stant  of  ~35  nM  in  150  itim  KCI  at  25  °C  (data  not  shown). 
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Fig.  2.  Fluorescence  is  RNA-dependent.  Histograms  show  the 
number  of  beads  at  different  GFP  fluorescence  intensities.  In  the 
presence  of  RNA  and  U1A-GFP,  the  fluorescence  intensity  of  the 
bead  population  increases,  and  a  more  homogeneous  bead  popula¬ 
tion  is  observed  (compare  panels  2,  3,  and  4).  Oligonucleotides  tar¬ 
geting  the  binding  site  or  not  complementary  to  the  RNA  show 
nonspecific  binding  (compare  panels  2,  7,  and  8). 


similar  to  that  reported  for  the  same  94-amino  acid  fragment 
in  the  absence  of  the  GFP  (14).  Oligonucleotides  complemen¬ 
tary  to  different  regions  of  the  U1A  RNA  were  designed  as 
illustrated  in  Fig.  IB  and  listed  in  Table  I.  Oligonucleotides 
were  synthesized  with  a  5'  biotin  label  and  attached  to 
streptavidin  beads.  Reproducible  coupling  conditions  were 
devised  to  ensure  similar  oligonucleotide  concentrations  per 
bead  as  determined  with  32P-labeled  oligos  (data  not  shown). 
The  oligonucleotide  concentrations  used  in  the  binding  assay 
are  estimated  to  be  between  1-2  nM.  For  all  experiments, 
Dynal  2.8-p.m  diameter  streptavidin  beads  were  used  unless 
otherwise  indicated. 

After  the  RNA-protein  complex  is  formed,  oligo-coupled 
beads  are  added.  To  reach  equilibrium,  incubations  at  25  °C 
or  35  °C  for  longer  than  6  h  were  necessary  (data  not  shown) 
and  typical  incubations  were  at  least  14  h.  After  reaching 
equilibrium,  GFP  fluorescence  on  individual  beads  was  as¬ 
sessed  in  a  flow  cytometer.  The  RNA  dependence  and  spec¬ 
ificity  of  the  binding  reactions  were  assessed  as  follows  to 
ascertain  the  validity  of  the  approach. 

The  observed  GFP  fluorescence  on  the  beads  is  RNA-de- 
pendent  as  illustrated  in  Fig.  2 A.  Background  U1A-GFP  bind¬ 
ing  to  the  beads  is  low,  and  the  peaks  are  broad  indicative  of 
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a  relatively  heterogeneous  population  (Fig.  2,  panel  2).  The 
observed  GFP  fluorescence  intensity  increases  with  higher 
111  A  RNA  concentrations,  and  the  population  is  more  homog¬ 
enous  as  indicated  by  the  narrower  peak  width  (Fig.  2,  panels 
3  and  4).  The  observed  signals  were  reproducible  with  differ¬ 
ent  RNA  and  protein  preparations. 

The  assay  accurately  distinguishes  the  U1A  RNA  binding 
site.  Oligonucleotides  complementary  to  the  binding  site 
compete  with  U1A-GFP  for  the  same  RNA  sequence  and 
thereby  reduce  observed  GFP  fluorescence.  Only  nonspecific 
background  GFP  signal  is  observed  for  oligos  complementary 
to  the  binding  site  (Fig.  2,  compare  panels  2  and  8).  On  the 
other  hand,  oligos  complementary  to  sequences  not  part  of 
the  U1A  binding  site  show  significant  GFP  signal  (oligos  2.17 
and  3.15,  Fig.  2 A,  panels  3-6).  These  oligonucleotides  are 
hybridizing  to  the  RNA  without  interfering  with  U1 A-GFP  bind¬ 
ing  thereby  allowing  the  observation  of  the  RNA-protein  inter¬ 
action.  As  a  control,  oligonucleotides  not  complementary  to 
the  RNA  show  background  nonspecific  signal  (Fig.  2,  com¬ 
pare  panels  2  and  7). 

Decreasing  the  oligonucleotide  length  lowers  the  GFP  flu¬ 
orescence  intensity.  20-mers,  1 7-mers,  and  1 5-mers  all  yield 
significant  fluorescence,  while  1 0-mers  complementary  to  the 
same  region  do  not  (data  not  shown).  Lower  fluorescence  is 
consistently  observed  for  1 5-mers,  such  as  oligo  3.15,  com¬ 
pared  with  1 7-mers,  such  as  oligo  2.17  (Fig.  2,  compare 
panels  3  and  4  to  panels  5  and  6).  These  observations  are  not 
limited  to  a  particular  region  of  the  RNA  or  sequence. 

Discrimination  of  Oligonucleotide  Mismatches— The  bead 
assay  discriminates  between  oligonucleotides  that  contain 
mismatches  under  conditions  that  preserve  RNA-protein  in¬ 
teractions.  When  mismatches  in  the  middle  of  the  comple¬ 
mentary  sequence  are  introduced,  the  oligonucleotide  yields 
significantly  lower  GFP  fluorescence.  Mismatch  discrimina¬ 
tion  is  not  unique  to  a  particular  sequence  as  oligonucleotides 
complementary  to  distinct  regions  show  a  significant  differ¬ 
ence  in  GFP  fluorescence  (Fig.  3a,  compare  oligos  2.17  and 
2C.17  and  oligos  3.17  and  3C.17).  Interestingly,  unlike  15- 
mers  and  1 7-mers,  20-mers  do  not  discriminate  mismatches  as 
well  (Fig.  3a,  compare  oligos  2.20  and  2C.20  with  oligos  2.17 
and  2C.1 7).  Also  mismatches  at  the  first  or  second  position  of 
either  end  of  the  oligonucleotide  are  not  discriminated  as  well  as 
those  in  the  middle  of  the  sequence  (data  not  shown). 

To  verify  the  observed  oligonucleotide  mismatch  discrimi¬ 
nation,  a  compensatory  mutation  in  the  RNA  was  constructed. 
Binding  reactions  were  prepared  with  the  two  different  RNAs, 
and  beads  coupled  to  either  oligo  2.17  or  2C.17  were  added. 
As  observed  previously  for  the  wild-type  U1A  RNA,  oligo  2.17 
shows  significant  GFP  fluorescence,  while  oligo  2C.17  does 
not  (Fig.  3b).  However,  the  compensatory  mutation  in  the  U1 A 
RNA,  77C->G,  creates  a  mismatch  for  oligo  2.17  and  disrupts 
the  hybridization  thereby  reducing  the  observed  GFP  fluores¬ 
cence.  Meanwhile,  significant  GFP  fluorescence  is  observed 
for  oligo  2C.17,  which  is  complementary  to  77C— »G  RNA  (Fig. 


3b).  Compensatory  RNA  mutations  and  subsequent  U1A 
binding  have  also  been  performed  with  oligos  3.17  and  3C.17 
(data  not  shown).  These  “rescue”  experiments  further  verify 
the  observed  point  mutant  hybridization  discrimination. 

The  observed  mismatch  discrimination  is  enhanced  by 
measuring  hybridization  through  U1  A-GFP  binding.  In  the  ab¬ 
sence  of  U1  A-GFP,  poor  discrimination  between  oligonucleo¬ 
tide  mismatches  is  observed  (Fig.  3c,  compare  panels  D  and 
£).  This  is  consistent  with  reports  of  poor  hybridization  behav¬ 
ior  of  short  oligonucleotide  sequence  (1 5,16).  However,  in  the 
presence  of  U1  A-GFP,  the  same  oligo  2.17  beads  show  both 
a  higher  GFP  and  Texas  Red  fluorescence  intensity,  while  two 
different  mismatches  of  oligo  2.17  show  significantly  reduced 
GFP  fluorescence  (Fig.  3c).  Thus,  in  physiological  conditions, 
mismatch  discrimination  is  observed  by  monitoring  hybridiza¬ 
tion  through  an  RNA-protein  interaction. 

Detection  of  Specific  RNA-Protein  Interactions— To  deter¬ 
mine  whether  the  observed  fluorescence  is  accurately  reflect¬ 
ing  the  U1 A  RNA-protein  interaction,  an  A  to  G  point  mutation 
in  the  U1 A  loop  (96A— >G)  known  to  disrupt  binding  was  tested 
(14).  This  mutation  severely  reduces  the  observed  GFP  signal 
as  shown  in  Fig.  4 A.  To  quantitate  Texas  Red  and  GFP 
fluorescence,  the  percentage  of  the  bead  population  shifted 
beyond  the  signals  observed  for  background  binding  is  de¬ 
termined.  The  quadrants  shown  in  panels  A-C  of  Fig.  3c 
determine  the  cut-offs  to  define  the  bead  populations  with 
different  combinations  of  Texas  Red  and  GFP  fluorescence. 
Oligos  2.17  and  3.17  show  U1A-GFP  signal  with  wild-type 
U1A  RNA,  while  only  background  fluorescence  is  observed 
with  96A->G.  Importantly  both  oligonucleotides  are  hybridiz¬ 
ing  to  the  RNA  as  indicated  by  significant  Texas  Red  fluores¬ 
cence  suggesting  that  the  lower  GFP  signal  is  due  to  disruption 
of  the  protein-RNA  complex  and  not  reduced  hybridization. 

As  predicted,  oligo  1.17,  complementary  to  the  U1 A  binding 
site,  does  not  show  any  significant  U1  A-GFP  binding,  similar 
to  the  experiments  described  above.  Importantly  oligo  1.17  is 
hybridizing  to  the  RNA  at  levels  similar  to  other  oligonucleo¬ 
tides  where  GFP  fluorescence  is  observed.  This  suggests  that 
the  low  observed  GFP  fluorescence  is  due  to  disruption  of  the 
RNA-protein  complex  and  not  poor  hybridization.  These  data 
demonstrate  the  ability  of  this  system  to  define  sequences 
important  for  protein  recognition  on  the  RNA  by  footprinting. 

To  further  demonstrate  that  the  observed  GFP  fluorescence 
accurately  reflects  the  RNA-protein  interaction,  the  affinity  of 
the  U1A  complex  was  measured  (Fig.  4B).  A  75  nM  dissocia¬ 
tion  constant  in  300  mM  KCI  at  25  °C  is  estimated  by  curve 
fitting  to  a  Langmuir  isotherm,  consistent  with  published  data 
(14).  Meanwhile,  the  96A->G  point  mutant  shows  no  signifi¬ 
cant  binding  under  the  same  conditions,  consistent  with  its 
—1000-fold  weaker  affinity  for  this  U1A  construct  (14).  Higher 
nonspecific  U1  A-GFP  binding  to  the  beads  causes  broader 
bead  population  distributions  and  is  probably  responsible  for 
the  larger  error  bars  observed  at  higher  protein  concentrations. 

The  observed  GFP  fluorescence  is  also  sensitive  to  the  salt 
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Fig.  3.  Mismatches  are  discriminated  when  monitoring  hybridization  indirectly  through  U1 A-GFP  binding,  a,  the  bar  graph  shows  the 
mean  fluorescence  of  observed  GFP  signal  of  bead  populations.  Fluorescence  intensity  is  in  arbitrary  units.  Triplicate  data  are  averaged,  and 
the  error  bars  represent  standard  deviations.  When  the  standard  deviation  is  less  than  1 ,  no  error  bar  is  shown.  Binding  reactions  include  75 
nM  U1A-GFP  and  6  nM  U1A  test  RNA.  b,  compensatory  RNA  mutations  restore  U1A-GFP  binding  for  an  oligonucleotide  point  mutant.  Oligo 
2.17  hybridizes  to  the  RNA  and  shows  U1A-GFP  binding,  while  2C.17  does  not.  77C^G,  which  is  an  exact  match  to  oligo  2C.17,  disrupts 
hybridization  to  oligo  2.17  and  allows  hybridization  to  oligo  2C.17.  Binding  reactions  include  10  nM  U1A  RNA  and  100  nM  U1A-GFP.  Error  bars 
represent  standard  deviations  of  duplicates,  c,  plots  show  Texas  Red  fluorescence  on  the  y  axis  and  GFP  fluorescence  on  the  x  axis.  Panels 
A-C  show  the  nonspecific  binding  of  Texas  Red  RNA  and  U1  A-GFP  to  the  beads.  Panels  D-F  illustrate  the  poor  discrimination  of  1 7 -mer  oligos 
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Fig.  4.  Bead  assay  accurately  re¬ 
flects  the  RNA-protein  interaction.  A, 

percentage  of  beads  shifted  above 
background  are  tabulated  into  different 
categories:  beads  with  Texas  Red  fluo¬ 
rescence  only  and  beads  with  GFP  fluo¬ 
rescence.  U1A-GFP  does  not  bind  the 
96A— >G  point  mutant  RNA,  but  oligonu¬ 
cleotides  still  significantly  hybridize  to 
the  RNA  as  indicated  by  the  Texas  Red 
fluorescence.  Error  bars  represent 
standard  deviations  of  triplicate  experi¬ 
ments.  B,  titration  of  U1A-GFP  into  3  nM 
RNA  generates  binding  curves.  Squares 
represent  the  96A-*G  point  mutant,  and 
circles  represent  the  wild-type  U1 A  RNA. 
Background  fluorescence  (l0)  in  the  ab¬ 
sence  of  RNA  is  subtracted  for  each  pro¬ 
tein  concentration.  Binding  curves  were 
performed  in  duplicate  and  averaged. 
Error  bars  represent  standard  devia¬ 
tions.  Curve  fitting  suggests  U1 A-GFP  is 
binding  with  a  dissociation  constant  of 
~75  nM  in  300  mM  KCI  at  25  °C. 
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concentration.  At  higher  KCI  concentrations,  the  GFP  fluores¬ 
cence  intensity  increases  by  ~25%  at  each  protein  concen¬ 
tration.  However,  the  dissociation  constant  shifts  from  ~35  to 
~75  nM.  These  observations  reflect  stronger  hybridization  and 
weaker  U1A  interaction  at  higher  salt  concentrations.  In  sum, 
these  data  demonstrate  the  specificity  and  affinity  of  the  U1A 
RNA-protein  interaction  on  beads. 

Specific  Binding  in  Mixed  Populations— For  screening  RNA- 
protein  interactions  with  the  bead  assay,  RNAs  will  have  to  be 
identified  from  complex  mixtures  of  RNAs.  To  determine 
whether  total  RNA  can  compete  with  U1  A-GFP  binding,  yeast 
RNA  was  added  to  the  binding  reactions.  Human  U1A  does 
not  specifically  bind  any  yeast  RNA  (17,  18).  Only  in  the 
presence  of  the  UlAtest  RNA  is  GFP  fluorescence  detectable 
with  oligo  2.17  as  shown  in  Fig.  5 A.  Oligonucleotides  not 
complementary  to  the  U1A  RNA  such  as  oligo  4.17  show 
fluorescence  equivalent  to  background.  This  suggests  that 
even  in  contexts  where  there  might  be  high  nonspecific  bind¬ 
ing,  specific  binding  is  observable.  These  experiments  were 
performed  with  larger  5.7-jxm  diameter  beads  because  the 
sensitivity  is  higher  (data  not  shown).  The  nonspecific  binding 
on  these  larger  beads  is  also  higher  as  more  GFP  fluores¬ 
cence  is  observed.  However,  specific  GFP  fluorescence  is 
observed  at  lower  concentrations  of  U1  A-GFP  compared  with 
the  smaller  2.8-p.m  microbeads. 


A  YeasIRNA  B  HeLa  RNA 


0  6  6  0  6 

Concent  ration  (oW) 

Fig.  5.  U1A-GFP  binding  to  the  U1A  test  RNA  and  to  U1  snRNA 
is  detectable  in  total  RNA  backgrounds.  A,  oligo  2.17  but  not  oligo 
4.17  detects  U1A-GFP  signal  in  0.1  pg !p\  yeast  total  RNA  and  25  nM 
U1A-GFP.  Fluorescence  intensities  are  higher  with  larger  5.7-ju.m 
beads.  Data  are  in  triplicate  with  standard  deviation  error  bars.  B, 
conversely  oligo  4.17  shows  GFP  fluorescence  in  HeLa  total  RNA. 
U1A-GFP  is  binding  to  snRNA  in  the  total  HeLa  RNA.  Oligo  4.17  also 
shows  higher  GFP  fluorescence  than  a  mismatch,  oligo  4G.17,  or  a 
noncomplementary  sequence,  oligo  2.17.  Reactions  include  0.16 
[xg/ix\  HeLa  total  RNA  (40  /xg  total)  and  50  nM  U1A-GFP.  Error  bars 
represent  standard  deviations  of  triplicates.  Background  signals  were 
subtracted  for  this  graph. 

To  determine  whether  RNAs  isolated  from  RNA  prepara¬ 
tions  can  be  identified  with  microbeads,  total  RNA  was  iso¬ 
lated  from  HeLa  cells  and  mixed  with  U1  A-GFP.  In  HeLa  RNA, 


binding  to  the  RNA  in  the  absence  of  U1A-GFP.  Oligonucleotide  mismatches  do  not  significantly  affect  the  observed  Texas  Red  signal.  Only  the 
wild-type  oligonucleotide  shown  in  panel  G  gives  significant  GFP  fluorescence  with  100  nM  U1  A-GFP  and  30  nM  U1A  RNA,  while  the  mismatches,  in 
panels  H  and  /,  show  some  Texas  Red  signal  but  no  significant  GFP  fluorescence. 


Molecular  &  Cellular  Proteomics  1.12  927 


RNA-Protein  Interactions  by  Flow  Cytometry 


U1A  binds  to  U1  snRNA  and  its  own  mRNA.  An  oligonucleo¬ 
tide  complementary  to  snRNA,  oligo  4.17,  shows  higher  GFP 
fluorescence  compared  with  a  mismatch,  oligo  4G.17,  and  a 
noncomplementary  oligonucleotide,  oligo  2.17,  as  illustrated 
in  Fig.  58.  The  U1  snRNA  concentration  is  —1-5  nM  in  these 
experiments.  The  observed  GFP  fluorescence  is  HeLa  RNA- 
dependent  as  the  signal  varies  with  HeLa  RNA  concentration 
but  remains  unchanged  with  increasing  equivalent  concentra¬ 
tions  of  yeast  RNA.  In  sum,  these  data  demonstrate  that  a 
specific  RNA  can  be  identified  with  microbeads  from  total 
RNA. 

An  additional  requirement  for  screening  with  oligonucleo¬ 
tide  bead  libraries  is  the  ability  to  detect  a  small  percentage  of 
oligonucleotide  beads  from  a  large  background  of  beads  that 
do  not  bind.  Initial  experiments  diluting  the  oligonucleotide 
beads  1 00-fold  suggest  that  at  even  low  oligonucleotide  con¬ 
centrations  the  RNA-protein  complex  can  be  detected.  Fur¬ 
ther  experiments  diluting  oligo  2.17  in  a  large  excess  of  oligo 
4.17  demonstrated  that  two  populations  of  beads  could  be 
differentiated  by  GFP  fluorescence  (data  not  shown).  These 
data  also  demonstrate  that  the  observed  shifts  are  sufficient 
to  identify  an  RNA-protein  complex. 

DISCUSSION 

We  describe  a  system  to  monitor  RNA-protein  interactions 
in  solution  with  microbeads  using  flow  cytometry.  We  dem¬ 
onstrate  the  versatility  of  the  approach  for  1)  discriminating 
between  mismatches  in  the  oligonucleotides,  2)  mapping  pro¬ 
tein  recognition  sites  on  RNA,  3)  differentiating  specific  and 
nonspecific  binding  RNAs,  and  4)  detecting  specific  RNAs  in 
complex  mixtures.  Importantly,  specific  binding  can  be  de¬ 
tected  in  high  nonspecific  RNA  backgrounds,  and  the  system 
can  discriminate  a  nonspecific  binding  point  mutant  at  a 
variety  of  protein  and  RNA  concentrations.  Because  flow  cy¬ 
tometry  is  used  to  monitor  the  fluorescence  on  distinct  oligo¬ 
nucleotide-coupled  microbeads,  the  system  is  amenable  to 
high  throughput,  genomic  scale  screening  of  RNA-protein 
interactions. 

With  the  U1A  interaction,  we  have  determined  the  funda¬ 
mental  requirements  for  using  this  microbead  system  for 
screening  RNA-protein  interactions.  The  approximate  2-fold 
changes  in  observed  GFP  fluorescence  are  sufficient  to  dis¬ 
tinguish  specific  protein  binding  to  RNA  from  background. 
Future  versions  of  the  system  may  have  increased  sensitivity 
and  dynamic  range  by  using  brighter  fluorophores  and  mi¬ 
crobeads  with  higher  oligonucleotide  densities.  Furthermore, 
because  the  system  is  monitoring  the  protein  binding  to  a 
distribution  of  thousands  of  microbeads,  the  fluorescence 
shifts  are  more  significant  than  just  monitoring  the  bulk  signal. 
Methods  to  analyze  the  distributions  more  quantitatively  are 
being  developed.  The  microbead  system  also  allows  for  the 
measurement  of  relative  affinities  of  a  protein  for  its  cognate 
RNA  thereby  distinguishing  specific  and  nonspecific  binding 
during  screening  and  thereby  reducing  false  positives.  Two 


approaches  are  possible.  After  initial  screening  at  a  particular 
protein  concentration,  binding  experiments  will  help  distin¬ 
guish  specific  and  nonspecific  binding  candidates.  Alterna¬ 
tively,  screening  at  different  protein  concentrations  could  be 
performed  to  determine  relatively  strong  and  weak  binding 
interactions. 

The  microbead-based  system  described  here  has  a  number 
of  advantages  over  other  recently  developed  RNA-protein 
screening  strategies.  It  is  rapid  with  few  time-consuming  ma¬ 
nipulations  required.  Also,  unlike  many  in  vivo  strategies,  there 
are  no  limitations  to  the  size  of  the  RNA  or  its  basic  structural 
features.  It  is  often  difficult  to  monitor  the  binding  of  large 
RNAs  directly  because  they  do  not  migrate  well  in  gel  elec¬ 
trophoresis  for  mobility  shift  studies.  With  the  microbead  sys¬ 
tem,  the  only  requirement  is  that  oligonucleotides  hybridize  to 
regions  distant  from  the  binding  site.  Most  mRNAs  have  mul¬ 
tiple  regions  accessible  to  hybridization  in  physiological  con¬ 
ditions  (19). 

With  the  system  presented  here,  hybridization  is  monitored 
indirectly  through  the  protein  fluorescence.  To  observe  pro¬ 
tein  fluorescence,  the  oligonucleotides  need  to  hybridize  to 
particular  RNAs  that  are  also  binding  the  protein.  The  combi¬ 
nation  of  observed  RNA  hybridization  and  protein  fluores¬ 
cence  on  a  microbead  indicates  that  an  RNA-protein  complex 
is  present.  Thus,  many  RNAs  may  be  hybridizing  to  the  beads, 
but  only  when  the  fluorescently  labeled  protein  is  bound  with 
high  affinity  to  one  of  these  RNAs  is  positive  signal  observed. 
This  significantly  reduces  the  nonspecific  binding  that  would 
be  observed  in  identifying  possible  RNA  targets  compared 
with  other  strategies  that  isolate  all  the  RNAs  bound  to  beads. 
In  its  current  form,  the  system  does  not  reach  equilibrium  for 
hours  presumably  because  of  slow  hybridization  at  physio¬ 
logical  temperatures.  Smaller  volumes  may  help  reduce  the 
time  required  to  reach  equilibrium. 

The  mismatch  discrimination  observed  with  this  bead  strat¬ 
egy  may  allow  it  to  be  adopted  for  single  nucleotide  polymor¬ 
phism  analysis.  Similar  to  assays  such  as  the  invasive  cleav¬ 
age  method  (20),  monitoring  hybridization  indirectly  provides 
a  sensitivity  enhancement  to  observe  the  subtle  effects  of 
mismatches  on  hybridization.  The  enhanced  mismatch  dis¬ 
crimination  observed  through  the  protein  interaction  may  be 
particular  to  the  U1A  system.  Future  studies  of  other  RNA- 
binding  proteins  will  determine  the  generality  of  the  observed 
mismatch  discrimination. 

For  genomic  screening,  proteins  bound  to  RNA  could  be 
challenged  with  large  oligonucleotide  coded  bead  libraries. 
The  coded  beads  would  be  sorted  in  a  flow  cytometer  while 
monitoring  RNA  and  protein  fluorescence  to  determine  which 
sequences  are  hybridizing  to  the  RNA  while  preserving  the 
RNA-protein  complex.  This  information  can  then  be  com¬ 
pared  with  sequenced  genomes  to  determine  which  RNAs  are 
binding  and  which  sequences  may  be  important  for  the  inter¬ 
action.  Various  coding  strategies  are  currently  being  devel¬ 
oped  that  do  not  require  decoding  or  very  large  beads  (9-12). 
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In  sum,  we  have  developed  a  microbead-based  system  to 
determine  which  RNAs  may  be  binding  a  particular  protein  as 
well  as  which  RNA  sequences  may  be  important  for  the  RNA- 
protein  interaction.  Many  applications  of  the  assay  are  possi¬ 
ble  including  binding  in  cell  extracts,  single  nucleotide  poly¬ 
morphism  analysis,  and  monitoring  the  effects  of  small 
molecules  on  RNA-protein  complexes.  Perhaps  the  most  in¬ 
viting  aspect  of  this  system  is  to  use  large  coded  oligonucleo¬ 
tide  bead  libraries  to  probe  RNA-protein  interactions  on 
genomic  scales. 
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